In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv('data/titanic.csv')

In [4]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [6]:
df.columns = df.columns.str.lower()

# Series, DataFrame and Index basics

## Series

![image-2.png](attachment:image-2.png)

Creating Series from:
- list: `pd.Series([])`, `pd.Series(data=[], index=[])`
- dict: `pd.Series({})`
- range: `pd.Series(range(start, stop, step)`
- random values: `pd.Series(np.random.randint(a, b, c))`
- distribution: `pd.Series(np.random.uniform(a, b, c))`, `pd.Series(np.random.normal(a, b, c))`
- datetime range: `pd.Series(pd.date_range('2022-01-01', periods=365, freq='d'))`

[Series Basics Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/1_series_basic_ex.ipynb)

## DataFrame

[DataFrame Basics Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/2_dataframe_basic_ex.ipynb)

## Index

[Index Basics Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/3_index_basic_ex.ipynb)

# Reading Data

# Inspecting DataFrame and Describing data

![image-3.png](attachment:image-3.png)

## Select a sample of rows

`df.head()`  `df.tail()`  [`df.sample()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html)

In [30]:
df.sample(frac=0.01)

Unnamed: 0,passengerid,survived,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked
139,140,0,1,"Giglio, Mr. Victor",male,24.0,0,0,PC 17593,79.2,B86,C
856,857,1,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",female,45.0,1,1,36928,164.8667,,S
762,763,1,3,"Barah, Mr. Hanna Assi",male,20.0,0,0,2663,7.2292,,C
876,877,0,3,"Gustafsson, Mr. Alfred Ossian",male,20.0,0,0,7534,9.8458,,S
309,310,1,1,"Francatelli, Miss. Laura Mabel",female,30.0,0,0,PC 17485,56.9292,E36,C
800,801,0,2,"Ponesell, Mr. Martin",male,34.0,0,0,250647,13.0,,S
785,786,0,3,"Harmer, Mr. Abraham (David Lishin)",male,25.0,0,0,374887,7.25,,S
807,808,0,3,"Pettersson, Miss. Ellen Natalia",female,18.0,0,0,347087,7.775,,S
236,237,0,2,"Hold, Mr. Stephen",male,44.0,1,0,26707,26.0,,S


## Data size and axes

`df.size`  `df.shape`  `df.axes` `df.columns` `df.index`


In [31]:
df.axes

[RangeIndex(start=0, stop=891, step=1),
 Index(['passengerid', 'survived', 'pclass', 'name', 'sex', 'age', 'sibsp',
        'parch', 'ticket', 'fare', 'cabin', 'embarked'],
       dtype='object')]

## Inspecting values

`df.values`  `df.dtypes`  `df.nunique()`  `S.unique()`  `S.value_counts()`  `df.info()` `df.select_dtypes`

In [48]:
df.nunique()

df.age.unique()

df.dtypes

df.age.value_counts()

df.select_dtypes(exclude=np.number)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   passengerid  891 non-null    int64  
 1   survived     891 non-null    int64  
 2   pclass       891 non-null    int64  
 3   name         891 non-null    object 
 4   sex          891 non-null    object 
 5   age          714 non-null    float64
 6   sibsp        891 non-null    int64  
 7   parch        891 non-null    int64  
 8   ticket       891 non-null    object 
 9   fare         891 non-null    float64
 10  cabin        204 non-null    object 
 11  embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


## Summary statistics

[`df.describe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html)  [`df.max()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.max.html)  `S.max`  `S.min` `df.indxmax()`  `S.idxmin()`

In [55]:
df.fare.sum()

df.describe(include=object)

df.describe(percentiles=[0.05, 0.95])

df.fare.idxmax()
df.fare.max()

512.3292

# Tidying Data

While tidying data we need to cope with following cases:

- The names of the variables are different from what you require
- There is missing data
- Values are not in the units that you require
- The period of sampling of records is not what you need
- Variables are categorical and you need quantitative values
- There is noise in the data
- Information is of an incorrect type
- Data is organized around incorrect axes
- Data is at the wrong level of normalization
- Data is duplicated

## Processing missing data

![image.png](attachment:image.png)

### Calculating with missing data

- `mean()` - method is totally ignore and not count a NaN values
- `sum()`- treats NaN as 0
- if all values are NaN, the result is NaN
- `cumsum()` and `cumprod()`ignore NaN values in calculation, but preserve them in resulting arrays
- the product of an empty or all-NA Series or column is 1
- in traditional math operations NaN will be propagated through to the result

### Detecting missing values

`pd.isna(df)`  `df.isna()`  `df.isnull()` `pd.notna(df)`  `df.notna()`  `df.notnull()`

In [57]:
df.isnull().sum()

passengerid      0
survived         0
pclass           0
name             0
sex              0
age            177
sibsp            0
parch            0
ticket           0
fare             0
cabin          687
embarked         2
dtype: int64

### Dropping missing values

[`df.dropna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html)

`df.dropna()`- delete all rows from DF containing missing values

`df.age.dropna()`- delete all rows from DF containing missing values in column `age`

`df.dropna(how='all')` - delete all rows from DF containing only NaN's

`df.dropna(how=any)`- delete all rows from DF containing any NaN's values

`df.dropna(axis=1)`- delete columns with NaN's

`df.dropna(thresh=5, inplace=True)`- delete rows containing at least 5 NaN's values

`df.dropna(subset=['age', 'cabin'])`- delete rows with NaN's values in certain columns


### Filling in missing values

[`df.fillna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html)

`df.fillna(0)`

`df.fillna.age(method='ffill')` - propagate last valid observation forward to next valid, same as `df.age.ffill()`

`df.fillna.age(method='bfill')`- use next valid observation to fill gap, same as `df.age.bfill()`

`df.fillna(df.mean())`- fill with mean value

`df.age.fillna(method='ffill', limit=3)`- if there is a gap with more than limit value of consecutive NaNs, it will only be partially filled 

### Interpolating missing values

[`df.interpolate()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html) 

`df.age.interpolate()`- linear interpolation by default

`df.datetime.interpolate(method='time')`- interpolating considering datetime index

`df.age.interpolate(method='values')` - interpolate based on the values in the index (using relative position)

### Replace invalid data for further NaN-based processing

`df.replace(['__', '?'], np.nan)`

## Handling duplicate data

![image.png](attachment:image.png)

[`df.duplicated()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html) - finding duplicated data

[`df.drop_duplicates()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html) - dropping duplicated data

In [59]:
df.duplicated().sum()

0

## Converting data types

![image-2.png](attachment:image-2.png)

[`pd.to_numeric()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html)   [`pd.to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)   [`pd.to_timedelta()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_timedelta.html)    [`datetime.datetime.strptime()`](https://docs.python.org/3.10/library/datetime.html?highlight=strptime#datetime.datetime.strptime)  [`df.astype()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html)  [`df.convert_dtypes()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html)

## Transforming data

![image.png](attachment:image.png)

### Map values

[`Series.map()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html) - used for substituting each value in a Series with another value, that may be derived from a function, a dict, or a Series. When arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN.

### Replace values

[`Series.replace()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.replace.html)   [`df.replace()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html)

### Apply a function

[`Series.apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html)[`df.apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)
[`df.applymap()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.applymap.html)

[Missing Values Tutorial](http://localhost:8888/notebooks/Projects/pandas_workshop/missing_data.ipynb)

[Duplicated Data](http://localhost:8888/notebooks/Projects/pandas_workshop/duplicated_data.ipynb)

[Transforming Data](http://localhost:8888/notebooks/Projects/pandas_workshop/transforming_data.ipynb)

[Missing Values Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/9_handling_missing_ex.ipynb)

[Example of tidying datetime data](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/13_ufo_analysis.ipynb)

# Filtering Data

[Filtering Data](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/7_filtering_ex.ipynb)

# Grouping and Aggregating Data

![image.png](attachment:image.png)

The [groupby object](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) has 4 methods that accepts a function(s) to perform a calculation on each group:

- [`df.groupby().agg()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html) - e.g. compute group sums or means, sizes / counts
- [`df.groupby().filter()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html) - discard some groups, e.g. filter out data based on the group sum or mean
- [`df.groupby().transform()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html) - standardize data, filling NaNs with value derived from each group
- [`df.groupby().apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) - apply a function group-wise

## pd.Grouper

[`pd.Grouper()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Grouper.html) - allows to specify a groupby instruction.

## Grouping by continuous variables

When grouping you use columns with descrete repeating values to make sence. To group by columns with continuous variables we need to do some transformation into a descrete column: [binning](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut), [rounding](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.round.html), or using some other [mapping](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html).

[Grouping and Aggregating Tutorial](http://localhost:8888/notebooks/Projects/pandas_workshop/grouping_aggregating.ipynb)

[Grouping and Aggregating Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/8_grouping_ex.ipynb)

# Combining Data Structures

## Concatenating, merging and joining, combining, compage objects

![image-2.png](attachment:image-2.png)

### Concatenating objects

[`pd.concat()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html)

- `objs`- a sequence or mapping of Series or DataFrame objects
- `axis`- {0/’index’, 1/’columns’}, default 0
- `join`- {‘inner’, ‘outer’}, default ‘outer’
- `ignore_index`- if `True`, do not use the index values along the concatenation axis

### Merging and joinng objects

[`pd.merge()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html) - combining objects by finding **matching values**.

- `left`- DataFrame
- `right`- DataFrame or named Series
- `how`- {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’
- `on`- column or index level names to join on. These must be found in both DataFrames
- `left_on`- column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame
- `right_on`- column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame
- `left_index`- use the index from the left DataFrame as the join key(s)
- `right_index`- use the index from the right DataFrame as the join key
- `suffixes`- list-like, default is (“_x”, “_y”)
- `validate`- if specified, checks if merge is of specified type:
1. “one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets
2. “one_to_many” or “1:m”: check if merge keys are unique in left dataset
3. “many_to_one” or “m:1”: check if merge keys are unique in right dataset
4. “many_to_many” or “m:m”: allowed, but does not result in checks


[`df.merge()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html)
Has the same arguments as `pd.merge()` apart from the `left`.

[`df.join()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html) - is used to perform join on row indices and **doesn't support joining on columns** unless setting column as index.
Efficiently join multiple DataFrame objects by index at once by passing a list.

- `other`- DataFrame, Series, or a list containing any combination of them
- `on`- column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index
- `how`- {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’
- `lsuffix`- suffix to use from left frame’s overlapping columns
- `rsuffix`- suffix to use from right frame’s overlapping columns
- `validate`- if specified, checks if join is of specified type:

1. “one_to_one” or “1:1”: check if join keys are unique in both left and right datasets
2. “one_to_many” or “1:m”: check if join keys are unique in left dataset
3. “many_to_one” or “m:1”: check if join keys are unique in right dataset
4. “many_to_many” or “m:m”: allowed, but does not result in checks

[`pd.merge_ordered()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_ordered.html) - designed for ordered data like time series with optional filling / interpolation.

- `left`- DataFrame
- `right`- DataFrame
- `how`- {‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘outer’
- `on`- field names to join on. Must be found in both DataFrames
- `left_on`- field names to join on in left DataFrame
- `right_on`- field names to join on in right DataFrame
- `left_by`- group left DataFrame by group columns and merge piece by piece with right DataFrame
- `right_by`- group right DataFrame by group columns and merge piece by piece with left DataFrame
- `suffixes`- list-like, default is (“_x”, “_y”)
- `fill_method`- {‘ffill’, None}, default None


[`pd.merge_asof()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html) - for each row in the left DF we select the last row in the right DF whose on key is less than the left's key. Both DF's must be sorted by the key.

- `left`- DataFrame or named Series
- `right`- DataFrame or named Series
- `on`- field names to join on. Must be found in both DataFrames
- `left_on`- field name to join on in left DataFrame
- `right_on`- field name to join on in right DataFrame
- `left_index`- use the index of the left DataFrame as the join key
- `right_index`- use the index of the right DataFrame as the join key
- `by`- match on these columns before performing merge operation
- `left_by`- field names to match on in the left DataFrame
- `right_by`- field names to match on in the right DataFrame
- `suffixes`- list-like, default is (“_x”, “_y”)
- `tolerance`- select asof tolerance within this range; must be compatible with the merge index
- `allow_exact_matches`- if True, allow matching with the same ‘on’ value (i.e. less-than-or-equal-to / greater-than-or-equal-to)
- `direction`- ‘backward’ (default), ‘forward’, or ‘nearest’

### Combining objects

[`df.combine()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine.html) - perform column-wise combine with another DF via function use.

[`df.combine_first()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine_first.html) - combines two DF's by filling null values in one DF with non-null values from other DF.

### Compare objects

[`df.compare()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.compare.html) - the method allows to summarize the differences of two data objects. By default equal values are omitted from the result. The remaining differences will be aligned on columns.

[Combining Data Tutorial](http://localhost:8888/notebooks/Projects/pandas_workshop/combining_data.ipynb)

[Joining and Merging Data Structires Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/4_joining_ex.ipynb)

[Good examples here](http://localhost:8888/notebooks/Projects/pandas_workshop/da_w_pandas/da_w_pandas_3.ipynb)

# Reshaping Data

## Pivoting, stacking, unstacking, melting and crosstab

![image-5.png](attachment:image-5.png)

### Pivoting

[`df.pivot()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html)
- `ìndex` - column to use to make new frame’s index. 
- `columns` - column to use to make new frame’s columns.
- `values`- column(s) to use for populating new frame’s values. 

[`df.pivot_table()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot_table.html)
- `values`: a column/list of columns to aggregate
- `index`: keys to group by on the pivot table index
- `columns`: keys to group by on the pivot table column
- `aggfunc` by default numpy mean
- `margins`: `True` - add special `All` columns and rows w/totals.

If the values is note given, will include all of the data aggregated.

In [21]:
pivot_df = df.pivot_table(index=['survived'], 
                          columns=['sex'], 
                          values=['name'], 
                          aggfunc='count', 
                          margins=True)
pivot_df

Unnamed: 0_level_0,name,name,name
sex,female,male,All
survived,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
0,81,468,549
1,233,109,342
All,314,577,891


### Stacking

Pivots a level of column labels to the row index.

**Unstacking** performs the opposite: pivoting a level of the row index into the column index.


[`df.stack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html):
- `level`- level(s) to stack from the column axis onto the index axis
- `dropna`- whether to drop rows in the resulting Frame/Series with missing values

[`df.unstack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html) move a level of the row index into a level of the columns axis. By default unstacks the last level.

In [23]:
stacked_df = pivot_df.stack()
stacked_df

Unnamed: 0_level_0,Unnamed: 1_level_0,name
survived,sex,Unnamed: 2_level_1
0,All,549
0,female,81
0,male,468
1,All,342
1,female,233
1,male,109
All,All,891
All,female,314
All,male,577


In [29]:
stacked_df.unstack(level=0)

Unnamed: 0_level_0,name,name,name
survived,0,1,All
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
All,549,342,891
female,81,233,314
male,468,109,577


### Melting

Is a type of un-pivoting: changing objects from *wide* to *long* format.

[`df.melt()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html) unpivot a DF from wide to long format, optionally leaving identifiers set.

The function and method transform a DF into a format where one or more columns are identifier variables, all other columns are unpivoted do the row axis, leaving two columns: "variable" and "value".

In [37]:
df.melt()

Unnamed: 0,variable,value
0,passengerid,1
1,passengerid,2
2,passengerid,3
3,passengerid,4
4,passengerid,5
...,...,...
10687,embarked,S
10688,embarked,S
10689,embarked,S
10690,embarked,C


### Crosstab

[`pd.crosstab`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html) compute a cross-tabulation of two (or more) factors. By default computes a frequency table of the factors.

- `index`: values to group by in the rows
- `columns`: values do group by in the cols`
- `values`: optional, values to agg`
- `rownames`: default None, must match number of rows/columns passed
- `colnames`: same
- `margins`: add row/column subtotals 
- `normalize`: boolean, {all, index, columns}, or {0, 1}. Normalize by dividing all values by the sum of values.

In [47]:
df = df.assign(age_group=lambda x: pd.cut(x.age, bins=5, labels=['junior', 'young', 'middle', 'adult', 'senior']))
pd.crosstab(index=df.survived, columns=[df.age_group])

age_group,junior,young,middle,adult,senior
survived,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,45,218,112,39,10
1,55,128,76,30,1


[Pivot Table Tutorial](http://localhost:8888/notebooks/Projects/pandas_workshop/pivoting_data.ipynb)

[Pivot Table Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/12_pivot_table_ex.ipynb)

[Good Examples here](http://localhost:8888/notebooks/Projects/pandas_workshop/da_w_pandas/da_w_pandas_3.ipynb)

# Data Visualization

# Working with String values

[Woring with strings an regular expressions Python tutorial](http://localhost:8888/notebooks/Projects/python_workshop/working_w_strings.ipynb)

[String and Regular Expressions Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/5_string_and_re_ex.ipynb)

# Working with Datetime values

## Date and time in Python

![image.png](attachment:image.png)

In [60]:
from datetime import date, time, datetime, timedelta

## Date and time in Pandas

[Time series / date functionality](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)

![image.png](attachment:image.png)

[`pd.Timestamp()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html)
[`pd.Period()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Period.html)

[`pd.date_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html)
[`pd.bdate_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.bdate_range.html)
[`pd.period_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.period_range.html)

Date Offcets:
- B - business day
- C - custom business day
- D - calendar day
- W - week
- M - month end
- MS - month start
- Q - quarter end
- QS - quarter start
- A, Y - year end
- AS, YS - year start
- H - hours
- T, min - minutes

Offcet aliases: W-THU - every Thirsday

[Working with dates and time in Python Tutorial](http://localhost:8888/notebooks/Projects/python_workshop/working_w_dates.ipynb)

[Working with dates and time in Pandas Tutorial](http://localhost:8888/notebooks/Projects/pandas_workshop/dates_and_time.ipynb)

[Timeseries Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/6_timeseries_ex.ipynb)

# Other

[Table Style Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/10_table_style_ex.ipynb)

[Working with Excel data Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/exercises/11_excel_da_ex.ipynb)

# Books

## Hands-on Data Analysis with Pandas

![image.png](attachment:image.png)

**Notebooks:**

[Section 1: Getting Started with Pandas](http://localhost:8888/notebooks/Projects/pandas_workshop/da_w_pandas/da_w_pandas_1.ipynb)

[Section 2: Using Pandas for Data Analisys](http://localhost:8888/notebooks/Projects/pandas_workshop/da_w_pandas/da_w_pandas_2.ipynb)

[Aggregating Padas DataFrames](http://localhost:8888/notebooks/Projects/pandas_workshop/da_w_pandas/da_w_pandas_3.ipynb)

[Visualizing Data with Pandas and Matplotlib](http://localhost:8888/notebooks/Projects/pandas_workshop/da_w_pandas/da_w_pandas_4.ipynb)

[Real-World Analyses Using Pandas](http://localhost:8888/notebooks/Projects/pandas_workshop/da_w_pandas/da_w_pandas_5.ipynb)

## Data Analysis Workshop



**Notebooks:**

[Bike sharing analysis](http://localhost:8888/notebooks/Projects/pandas_workshop/da_workshop/da_workshop_1.ipynb)

## Python Data Analysis

![cover_python_da.jpeg](attachment:cover_python_da.jpeg)

**Notebooks:**

[Intro](http://localhost:8888/notebooks/!_work/misc/data_analysis/python_da_packt_1.ipynb)

[NumPy arrays](http://localhost:8888/notebooks/!_work/misc/data_analysis/python_da_packt_2.ipynb)

[Pandas](http://localhost:8888/notebooks/Projects/pandas_workshop/python_da/python_da_packt_3.ipynb)

[Statistics](http://localhost:8888/notebooks/Projects/pandas_workshop/python_da/python_da_packt_4.ipynb)

[Linear Algebra](http://localhost:8888/notebooks/Projects/pandas_workshop/python_da/python_da_packt_5.ipynb)

[EDA and Data Cleaning](http://localhost:8888/notebooks/Projects/pandas_workshop/python_da/python_da_packt_6.ipynb)

## Learning Pandas
![image.png](attachment:image.png)

**Notebooks:**

[Basic objects: Series, DataFrame and Index](http://localhost:8888/notebooks/Projects/pandas_workshop/learning_pandas/part_1.ipynb#Майкл-Хейдт-Изучаем-pandas)

## Pandas Cookbook

![image.png](attachment:image.png)

**Notebooks:**

[Pandas Foundations](http://localhost:8888/notebooks/Projects/pandas_workshop/pandas_cookbook/pandas_cookbook_1.ipynb)

[Exercises](http://localhost:8888/notebooks/Projects/pandas_workshop/pandas_cookbook/pd_cookbook_exercises.ipynb)

## The Applied Data Science Workshop

![image.png](attachment:image.png)

**Notebooks:**

[Intro](http://localhost:8888/notebooks/!_work/misc/data_analysis/ds_workshop_1.ipynb)