### WHAT IS PANDAS?

*Data is an integral part of our current world. It helps us predict various events and gives a certain direction to our lives. Pandas help us control and manipulate such data. Thus without a grasp over the knowledge of Pandas, you can completely forget about trying to become a Data Scientist or Data Analyst. Pandas are an essential tool for a beginners journey to work with data.*

*Pandas provide essential data structures like series, dataframes, and panels which help in manipulating data sets and time series.*

*It is free to use and an open source library, making it one of the most widely used data science libraries in the world.*

*Pandas possess the power to perform various tasks. Whether it is computing tasks like finding the mean, median and mode of data, or a task of handling large CSV files and manipulating the contents according to our will, Pandas can do it all. In short, to master data science, you must be skillful in Pandas.*

*Pandas is used for data manipulation, analysis and cleaning. Python pandas is well suited for different kinds of data, such as:*

1. *Tabular data with heterogeneously-typed columns*
2. *Ordered and unordered time series data*
3. *Arbitrary matrix data with row & column labels*
4. *Unlabelled data*
5. *Any other form of observational or statistical data sets*

In [0]:
# import pandas as pd
import pandas as pd

_Pandas deals with the following three data structures −_

1. _**Series**: 1D labeled homogeneous array, sizeimmutable._ 
2. _**DataFrame**: General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns._
3. _Panel: General 3D labeled, size-mutable array._

_These data structures are built on top of Numpy array, which means they are fast._


### pandas.Series

_Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index._




```
pandas.Series( data, index, dtype, copy)

data: data takes various forms like ndarray, list, constants
index: Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
dtype: dtype is for data type. If None, data type will be inferred
copy: Copy data. Default False
```


In [0]:
# Empty Series

s = pd.Series()
print (s)

In [0]:
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)

In [0]:
# Series from Numpy Array
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print (s)

In [0]:
# Series from Dictionary
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print (s)

In [0]:
# Series from Indexed Dictionary
data = {'a' : 'Piyush', 'b' : 'Sneha', 'c' : 'Singla','f':'Singla'}
s = pd.Series(data,index=['b','c','d','a','f'],name = 'SeriesWithNames')
print (s)

#### SERIES INDEXING 

In [0]:
# Access the first element
print (s[0])

#retrieve the first three element
print (s[:3])

#retrieve the last three element
print (s[-3:])

#retrieve a single element using index
print (s['a'])

#retrieve multiple elements using index
print (s[['a','c','d']])

# Get values from Series
print(s.values)

# Get Value at index from Series
print(s.values[0])

#### SERIES METHODS

In [0]:
# Count of Elements in Series: Returns number of non-NA/null observations in the Series
print(s.count())

# Gets Size of Series: Returns the number of elements in the underlying data
print(s.size)

# Name of Series
print(s.name)

# Checks if all elements in Series are Unique
print(s.is_unique)

# Sorts Series : Method is called on a Series to sort the values in ascending or descending order
print(s.sort_values(ascending = False))

# Sort index : Method is called on a pandas Series to sort it by the index instead of its values
print(s.sort_index(ascending = False))

# prints first 4
print(s.head())

# prints last 4
print(s.tail(4))

# Prints unique elements in Series : unique() is used to see the unique values in a particular column
print(s.unique())

# Prints no of unique elements in Series :  Pandas nunique() is used to get a count of unique values
print(s.nunique(dropna = True))

# Prints count of each  elements in Series :  Method to count the number of the times each unique value occurs in a Series
print(s.value_counts())

# Series from Dictionary
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print (s)

#idxmax()	Method to extract the index positions of the highest values in a Series
print(s.idxmax())

#idxmin()	Method to extract the index positions of the lowest values in a Series
print(s.idxmin())

#astype()	Method is used to change data type of a series
print(s.astype(np.int))

#tolist()	Method is used to convert a series to list
print(s.tolist())

#to_numpy() to convert to array
print(s.to_numpy())

In [0]:
# creating a series
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
data1 = pd.Series([1, 6, 3, 9], index=['a', 'b', 'd', 'e'])

#le()	Used to compare every element of Caller series with passed series.It returns True for every element which is Less than or Equal to the element in passed series
print(data.le(data1))

#ne()	Used to compare every element of Caller series with passed series. It returns True for every element which is Not Equal to the element in passed series
print(data.ne(data1))

#ge()	Used to compare every element of Caller series with passed series. It returns True for every element which is Greater than or Equal to the element in passed series
print(data.ge(data1))

#eq()	Used to compare every element of Caller series with passed series. It returns True for every element which is Equal to the element in passed series
print(data.eq(data1))

#gt()	Used to compare two series and return Boolean value for every respective element
print(data.gt(data1))

#lt()	Used to compare two series and return Boolean value for every respective element
print(data.lt(data1))

In [0]:
# add()	Method is used to add series or list like objects with same length to the caller series
print(data.add(data1))

# sub()	Method is used to subtract series or list like objects with same length from the caller series
print(data.sub(data1))

# mul()	Method is used to multiply series or list like objects with same length with the caller series
print(data.mul(data1))

# div()	Method is used to divide series or list like objects with same length by the caller series
print(data.div(data1))

# sum()	Returns the sum of the values for the requested axis
print(data.sum())

# prod() Returns the product of the values for the requested axis
print(data.prod())

# mean() Returns the mean of the values for the requested axis
print(data.mean())

# pow()	Method is used to put each element of passed series as exponential power of caller series and returned the results
print(data.pow(data1))

# abs()	Method is used to get the absolute numeric value of each element in Series/DataFrame
print(data.sub(data1).abs())

# cov()	Method is used to find covariance of two series
print(data.cov(data1))

In [0]:
# Pandas combine_first() method is used to combine two series into one. The result is union of the two series that is in case of Null value in caller series, the value from passed series is taken.
# In case of both null values at the same index, null is returned at that index.

# creating series 1  
series1 = pd.Series([70, 5, 0, 225, 1, 16, np.nan, 10, np.nan])  
    
# creating series 2  
series2 = pd.Series([27, np.nan, 2, 23, 1, 95, 53, 10, 5])  
  
# combining and returning results to variable 
# calling on series1 
result1 = series1.combine_first(series2)   
# calling on series2 
result2 = series2.combine_first(series1) 
  
# printing result 
print('Result 1:\n', result1, '\n\nResult 2:\n', result2) 

### pandas.DataFrame

_A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns._

#### CREATING DATAFRAME

In [0]:
# Empty DataFrame

df = pd.DataFrame()
print (df)

In [0]:
# Dataframe from List

data = [['Piyush',28],['Sneha',28],['Ruchi',24]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)

In [0]:
# Creating a DF with Randomm numbers

pd.DataFrame(np.random.rand(4,3)) 

In [0]:
# Dataframe from Dictionary

data = {'Name':['Piyush', 'Sneha', 'Ruchi', 'Rishabh'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['P1','P2','P3','P4'])
print (df)

In [0]:
# Define a dictionary containing employee data
data = {'Name':['Piyush', 'Ruchika', 'Sneha', 'Seema','Sandeep','Abhishek','Shivi','Rohit','Sunny','Mihir','Kirti','Sachin'],
        'Age':[27, 24, 22, 32,45,32,31,34,33,21,22,47],
        'Address':['Chandigarh', 'Gurgaon', 'Delhi', 'Delhi', 'Chandigarh','Lucknow','Delhi','Ambala','Delhi','Kolkata','Indore','Nagpur'],
        'Qualification':['BTECH', 'CA', 'BCOM', 'MA', 'BE','MTECH','MBA','MCOM','PHD','CA','MBA','ARTS'],
        'Rating':[5,4.5,2,3,4,3.5,5,2,1.5,4,3,3]}
 
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data, dtype = 'float')
 
# select two columns
print(df[['Name', 'Qualification']])

In [0]:
# Column Addition 
df['Job'] = ['IT','CA','Audit','Teacher','IT','CA','Audit','Teacher','IT','CA','Audit','Teacher']

print(df[['Name','Job']])

In [0]:
# Column Deletion 
del df['Address']

# Deletion of Row
df.drop(3, inplace = True)

In [0]:
# set index
df.index = range(0,11) 



```
DataFrame − “index” (axis=0, default), “columns” (axis=1)
```



#### FUNCTIONS

In [0]:
# information about the DF
print(df.info())

In [0]:
# Count of Rows and Columns
df.shape 

In [0]:
# Returns 1 if Series and 2 if DataFrame
df.ndim 

In [0]:
# Checking Index : returns index (row labels) of the DataFrame 
df.index

In [0]:
# return numpy array of dataframe
df.values

In [0]:
# Print column names
print(df.columns)

In [0]:
# renaming columns
df.columns = ['name','age','qualification','rating','job'] 

In [0]:
 # rename columns
df.rename(columns = {'name':'Name','age':'Age','qualification':'Qualification','rating':'Rating','job':'Job'}, inplace = True)

In [0]:
print(df.count())	 # count()	Number of non-null observations

print(df.sum()) # sum()	Sum of values
print(df.sum(axis = 1))
print(df['Age'].sum())

print(df.mean()) # mean()	Mean of Values

print(df.median()) # median()	Median of Values

print(df.mode()) # mode()	Mode of values

print(df.std()) # std()	Standard Deviation of the Values

print(df.min()) # min()	Minimum Value

print(df.max()) # max()	Maximum Value

In [0]:
print(df['Age'].nlargest())
print(df['Age'].nsmallest())

In [0]:
# Summary of Statistics
print(df.describe()) 

In [0]:
print (df.describe(include=['object']))

In [0]:
print(df['Age'].dtype) # Check Data Type

In [0]:
# Check between Range
print(df['Age'].between(20,30))

In [0]:
df[df.Age.isin([22])] # .isin([]) : extracts rows from a DataFrame where a column value exists in a predefined collection

In [0]:
df.loc[:,df.columns.isin(['Age'])] # Check if column is there

In [0]:
df.loc[:,~df.columns.isin(['Age'])] # gets if column is not there

In [0]:
df.sort_values('Age',ascending= False) # sorts df by Age column

In [0]:
df.sort_index(ascending = False) # Sort DF by index

In [0]:
df.reset_index() # resets the index of DF and makes it as column

In [0]:
df.reset_index(drop= True) # resets index and drop the column added

#### NULL CHECKS

In [0]:
# CHECKING FOR NULLS

print(df.isnull()) # check for nulls

print(df['Age'].isnull()) # column based null check

print(df.notnull()) # check not null

In [0]:
df.dropna(how='any', axis =0) # Drops Row( axis = 0) for ANY NA Value
df.dropna(how='all', axis =1) # Drops Column( axis = 1) for ALL NA Value 

In [0]:
df.fillna(value=0) # Fills Na value with value

#method='ffill': Ffill or forward-fill propagates the last observed non-null value forward until another non-null value is encountered.
#method='bfill': Bfill or backward-fill propagates the first observed non-null value backward until another non-null value is met.

In [0]:
pd.isna(df) # Check if NA in dataframe

#### INDEXING AND SELECTION



```
.loc() : Label based

.iloc() : Integer based
```



In [0]:
# Selecting Column - Will Give Series

df['Age']

In [0]:
# Selecting Multiple Columns

df[['Name','Age']]

In [0]:
# Selecting Rows

df[0:2]

In [0]:
# Selecting Columns based on Labels

df.loc[:,['Age','Rating']]

In [0]:
# Selecting Columns based on Labels

df.loc[:5,['Age','Rating']]

In [0]:
# Selecting on basis of index 

df.iloc[:4,:2]

#### FILTERING

In [0]:
# Conditional Filtering 

filter1 = df["Age"] > 25
filtered_review = df[filter1]
filtered_review

In [0]:
filter2 = (df["Age"] > 25) & (df["Rating"] > 3)
filtered_review1 = df[filter2]
filtered_review1

In [0]:
df[df['Age']>30] # Direct Filtering

In [0]:
df.filter(items=['Rating', 'Job']) # filters columns

In [0]:
df.filter(regex='e$', axis=1) # fIlters Columns with e

In [0]:
df.filter(like='N', axis=1) # Filters columns with N

In [0]:
# where()	It is used to check a DataFrame for one or more condition and return the result accordingly

df.where(filter1)

In [0]:
df[df['Name'].str.contains('P')] # Column Contains

In [0]:
df[df['Name'].str.startswith('K')] # Column Value starts with 

In [0]:
df.replace('Sunny','Sneha',inplace = True) # Replace Value 
df

In [0]:
# Taking Sample of DataFrame 

df.sample(n= 5, random_state = 2)  # It pulls out a random sample of rows or columns from a DataFrame 

#### APPENDING TO DATAFRAME

In [0]:
# Adding Row Data to DF

modDf = df.append({'Name' : 'Sahil' , 'Age' : 22, 'Qualification':'CA', 'Rating' : 2.5, 'Job': 'Audit'} , ignore_index=True)
modDf

In [0]:
# Add List of Series

listOfSeries = [pd.Series(['Raju', 21, 'BTECH', 2, 'Operator'], index=df.columns) ,
                pd.Series(['Sam', 22, 'CA', 5, 'Security'], index=df.columns) , 
                pd.Series(['Rocky', 23, 'BCOM', 1,'Entry'], index=df.columns) ]

modDf = modDf.append(listOfSeries, ignore_index = True, sort = False)
modDf

In [0]:
# Appending Data Frames

data1 = {'Name':['Piyush', 'Ruchika', 'Sunny','Mihir','Kirti','Sachin'],
        'Age':[27, 24, 22, 32,45,32],
        'Qualification':['BTECH', 'CA', 'BCOM', 'MA', 'BE','MTECH'],
        'Rating':[5,4.5,2,3,4,3.5],
         'Job': ['CA','IT','Pharma','Repairs','IT',np.nan]}

data1 = pd.DataFrame(data1)

modDf = modDf.append(data1, ignore_index = True, sort = False)
modDf

In [0]:
# Concatenate Data Frames

data1 = {'Name':['Akshay', 'Jordi', 'Amit','Kishan','Jagriti','Akriti'],
        'Age':[27, 24, 22, 32,45,32],
        'Qualification':['BTECH', 'CA', 'BCOM', 'MA', 'BE','MTECH'],
        'Rating':[5,4.5,2,3,4,3.5],
         'Job': ['CA','IT','Pharma','Repairs','IT',np.nan]}

data1 = pd.DataFrame(data1)

modDf = pd.concat([modDf,data1])
modDf = modDf.reset_index(drop = True)

modDf

#### TREATING DUPLICATES

In [0]:
modDf.duplicated() #It creates a Boolean Series and uses it to extract rows that have a duplicate value

In [0]:
modDf['Name'].drop_duplicates(keep ='first')

# keep = 'first' - removes all occurance except first
# keep = 'last' - removes all occurance except last
# keep = False - removes all duplicate rows

#### CREATING COPY

In [0]:
# copy()	It creates an independent copy of pandas object

modDf2 = modDf.copy()
modDf2

#### INPUT & OUTPUTS



```
# OUTPUT
df.to_csv(“filename”)
df.to_excel(“filename”)
df.to_sql(table_name, connection_object)
df.to_json(“filename”)

# INPUT
pd.read_csv(“filename”)
pd.read_table(“filename”)
pd.read_excel(“filename”)
pd.read_sql(query, connection_object)
pd.read_json(json_string)
```



In [0]:
# df.to_csv('/content/sample_data/',sep=',',columns = df.columns,header = True, index = True, index_label=df[1], date_format='%Y-%m-%d')

In [0]:
# pd.read_csv('/content/sample_data/', sep=',', delimiter=None, 
               # header='infer', names=None, index_col=None,  dtype=None, engine=None, skiprows=None,, na_values=None, keep_default_na=True, 
               # na_filter=True, verbose=False, parse_dates=False, date_parser=None)

In [0]:
# write = pd.ExcelWriter(path, date_format=None, datetime_format=None, mode='w')
# df.to_excel(writer, sheet_name=familyid)

#### TRANSFORMATIONS

##### GROUPING



```
df.groupby(column) – Returns a groupby object for values from one column
df.groupby([column1,column2]) – Returns a groupby object values from multiple columns
df.groupby(column1)[column2].sum() – Returns the sum of the values in column2, grouped by the values in column1
df.groupby(column1)[column2].mean() – Returns the mean of the values in column2, grouped by the values in column1
df.groupby(column1)[column2].median() – Returns the mean of the values in column2, grouped by the values in column1
```



In [0]:
modDf.groupby(['Age'])['Rating'].sum()

In [0]:
gp = modDf.groupby(['Name'])
gp.get_group('Sneha') # gets the group 

In [0]:
modDf.groupby(['Name','Job'])['Rating'].mean()

In [0]:
gf = modDf.groupby(['Name','Job'])
gf['Rating','Age'].agg([np.sum, np.mean])

In [0]:
gf.filter(lambda x: sum(x['Age']) >= 35)

In [0]:
gf.agg({'Age':sum,'Rating':sum})

##### PIVOTING

In [0]:
df_new = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 6,
                  'B': ['A', 'B', 'C'] * 8,
                  'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                  'D': np.random.randn(24),
                  'E': np.random.randn(24)})

pd.pivot_table(df_new, values='D', index=['B'], columns=['A', 'C'], aggfunc=np.sum)

In [0]:
pd.pivot_table(df_new, values=['D', 'E'], index=['B'], columns=['A', 'C'], aggfunc=np.sum)

In [0]:
df_new.pivot_table(index=['A', 'B'], columns='C', margins=True, aggfunc=np.std) # shows Margins True

##### UNPIVOT/MELT

_The top-level melt() function and the corresponding DataFrame.melt() are useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are “unpivoted” to the row axis, leaving just two non-identifier columns, “variable” and “value”. The names of those columns can be customized by supplying the var_name and value_name parameters._

In [0]:
cheese = pd.DataFrame({'first': ['John', 'Mary'],
                       'last': ['Doe', 'Bo'],
                       'height': [5.5, 6.0],
                       'weight': [130, 150]})
   
print(cheese)

cheese.melt(id_vars=['first', 'last'])

In [0]:
cheese.melt(id_vars=['first', 'last'], var_name='quantity')

##### CROSSTAB

In [0]:
df = pd.DataFrame({'A': [1, 2, 2, 2, 2], 'B': [3, 3, 4, 4, 4],'C': [1, 1, np.nan, 1, 1]})

In [0]:
pd.crosstab(df['A'],df['B'])

In [0]:
pd.crosstab(df['A'],df['B'], normalize = True) # normalized values

In [0]:
pd.crosstab(df['A'], df['B'], normalize='columns') # Normalized at Column Level

In [0]:
pd.crosstab(df['A'], df['B'], values=df['C'], aggfunc=np.sum) # Using 3rd Columns as Vlaue and applying the function 

##### AGGREGATE

In [0]:
df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [np.nan, np.nan, np.nan]],
                  columns=['A', 'B', 'C'])

In [0]:
df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})

##### TRANSFORM

In [0]:
df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
df

In [0]:
df.transform(lambda x: np.sqrt(x))

In [0]:
df.transform([np.sqrt, np.exp])

##### APPLY & MAP

_Apply() is used to apply the function on the dataframe or Series._



```
First major difference: DEFINITION
1. map is defined on Series ONLY
2. applymap is defined on DataFrames ONLY
3. apply is defined on BOTH

Second major difference: INPUT ARGUMENT
1. map accepts dicts, Series, or callable
2. applymap and apply accept callables only

Third major difference: BEHAVIOR
1. map is elementwise for Series
2. applymap is elementwise for DataFrames
3. apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): USE CASE
1. map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))
2. applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))
3. apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize))
```



In [0]:
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])

In [0]:
df.apply(np.sum, axis=0)

In [0]:
df.apply(lambda x: [1, 2], axis=1)

In [0]:
df.apply(lambda x: [1, 2], axis=1, result_type='expand')

In [0]:
df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)

In [0]:
df.apply(max, axis=1)

In [0]:
s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])

s.map({'cat': 'kitten', 'dog': 'puppy'}) # When arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN. map is used for applying an element-wise function

In [0]:
df.applymap(np.sqrt)

In [0]:
df.apply(lambda x: x.max() - x.min())

In [0]:
df.applymap(lambda x: x*100)

#### MERGE/JOINS



```
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
         left_index=False, right_index=False, sort=True,
         suffixes=('_x', '_y'), copy=True, indicator=False,
         validate=None)

left 
LEFT OUTER JOIN: Use keys from left frame only

right
RIGHT OUTER JOIN: Use keys from right frame only

outer
FULL OUTER JOIN: Use union of keys from both frames

inner
INNER JOIN: Use intersection of keys from both frames

```



In [0]:
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                       'A': ['A0', 'A1', 'A2', 'A3'],
                         'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                          'C': ['C0', 'C1', 'C2', 'C3'],
                          'D': ['D0', 'D1', 'D2', 'D3']})
    
result = pd.merge(left, right, on=['key', 'key2'])
result

#### ITERATORS

_To iterate over the rows of the DataFrame, we can use the following functions −_

_**iteritems()** − to iterate over the (key,value) pairs_

_iterrows() − iterate over the rows as (index,series) pairs_

In [0]:
df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])
print(df)

for key,value in df.iteritems():
   print (key,value)

In [0]:
for row_index,row in df.iterrows():
   print (row_index,row)

In [0]:
for label, ser in df.items():
    print(label)
    print(ser)

#### CATEGORICAL DATA



```
pd.Categorical(values, categories, ordered)
```



In [0]:
s = pd.Series(["a","b","c","a"], dtype="category")
print (s)

In [0]:
cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'],ordered=True)
print (cat)

In [0]:
# pd.factorize() method helps to get the numeric representation of an array by identifying distinct values.

pd.factorize(cat)

In [0]:
df = pd.DataFrame({'value': np.random.randint(0, 100, 10)})

labels = ["{0} - {1}".format(i, i + 9) for i in range(0, 100, 10)]
df['group'] = pd.cut(df.value, range(0, 105, 10), right=False, labels=labels)

df

In [0]:
values = np.random.randn(10)
print(values)
bins = [0, 0.2, 0.4, 0.6, 0.8, 1]
pd.get_dummies(pd.cut(values, bins))

#### DATE TIME 

```
B	business day frequency	
BQS	business quarter start frequency
D	calendar day frequency	
A	annual(Year) end frequency
W	weekly frequency	
BA	business year end frequency
M	month end frequency	
BAS	business year start frequency
SM	semi-month end frequency	
BH	business hour frequency
BM	business month end frequency	
H	hourly frequency
MS	month start frequency	
T, min	minutely frequency
SMS	SMS semi month start frequency	
S	secondly frequency
BMS	business month start frequency	
L, ms	milliseconds
Q	quarter end frequency	
U, us	microseconds
BQ	business quarter end frequency	
N	nanoseconds
QS	quarter start frequency
```



In [0]:
# Creating Date Range
rng = pd.date_range('1/1/2012', periods=10, freq='1M')
rng

In [0]:
start = pd.datetime(2011, 1, 1)
end = pd.datetime(2011, 1, 5)

pd.date_range(start, end)

In [0]:
ts = pd.Series(np.random.randint(0, 10, len(rng)), index=rng)
ts.resample('15D').sum()

In [0]:
rng.to_period('S')

In [0]:
# Creating Time Data
pd.timedelta_range(0, periods=10, freq='H')

In [0]:
pd.to_datetime(['2018-01-05', '7/8/1952', 'Oct 10, 1995'])

In [0]:
# Formating Date
df = pd.to_datetime(['2/25/10', '8/6/17', '12/15/12'], format='%m/%d/%y')
df

In [0]:
# Extracting Date Features
print(df.year)
print(df.month)
print(df.weekday)
print(df.dayofyear)
print(df.days_in_month)

In [0]:
df = pd.DataFrame({'Value': np.random.randn(20)}, index = pd.date_range('2020/01/31', periods= 20, freq='D') )

In [0]:
df

In [0]:
df.resample('W').mean() # Resampling the Data

In [0]:
df['Value'].rolling(window = 7, center = True).mean() # Finding Rolling mean 

In [0]:
df['Value'].rolling(window = 7).std()  # Finding Rolling Std

#### VISUALIZATION

In [0]:
df = pd.DataFrame(np.random.randn(10,4),index=pd.date_range('1/1/2000',
   periods=10), columns=list('ABCD'))

df.plot()

In [0]:
df = pd.DataFrame(np.random.rand(10,4),columns=['a','b','c','d'])
df.plot.bar()

In [0]:
df.plot.bar(stacked=True)

In [0]:
df.plot.barh(stacked=True)

In [0]:
df = pd.DataFrame({'a':np.random.randn(1000)+1,'b':np.random.randn(1000),'c':
np.random.randn(1000) - 1})

df.plot.hist(bins=20)

In [0]:
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
df.plot.box()

In [0]:
df.plot.area()

In [0]:
df = pd.DataFrame(np.random.rand(50, 4), columns=['a', 'b', 'c', 'd'])
df.plot.scatter(x='a', y='b')

#### CUSTOMIZATION & OPTIONS

In [0]:
print (pd.get_option("display.max_rows"))
print (pd.get_option("display.max_columns"))

pd.set_option("display.max_rows",80)
pd.set_option("display.max_columns",80)

print (pd.get_option("display.max_rows"))
print (pd.get_option("display.max_columns"))