<h2 style="color:Blue;"> Advance Pandas Trick and Techniques </h2>

----

<h3 style="color:Blue;">Index of learning</h3>  <a id="00"> </a>


---
1. [**Basic Pandas Reading Method**](#1)
1. [**Advance Pandas Reading Methods**](#2)
    1. [**Manipulating Column & Index Locations and Names**](#21)
    1. [**Data Parsing options**](#22)
    1. [**Reading data from excel files**](#23)
    1. [**Reading data from some other popular formats**](#24)
1. [**Apply multiple filter criteria to a pandas DataFrame**](#3)
1. [**Changing the datatype of a Pandas Series**](#4)
1. [**Filter rows of a pandas DataFrame by column value**](#5)
1. [**Selecting multiple rows and columns from a pandas DataFrame**](#6)
1. [**Sorting a pandas DataFrame or a Series**](#7)
1. [**Using pandas Series data structure to select a subset of the data**](#8)
1. [**Using string methods in pandas**](#9)
1. [**Using the axis parameter in pandas**](#10)
1. [**Applying a function to a pandas Series or DataFrame** ](#11)
1. [**Handling SettingWithCopyWarning**](#12)
1. [**Handling missing values in pandas**](#13)
1. [**Indexing in pandas dataframes**](#14)
1. [**Merging and concatenating multiple data frames into one** ](#15)
1. [**Modifying a Pandas Dataframe inplace**](#16)
1. [**Removing columns from a pandas DataFrame**](#17)
1. [**Renaming columns in a pandas DataFrame**](#18)
1. [**Using groupby method**](#19)
1. [**Work with dates and times data**](#20)
1. [**Choosing the colors for the plots**](#211)
1. [**Controlling plot aesthetics**](#221)
1. [**Plotting categorical data**](#231)
1. [**Plotting with data aware grids**](#241)
 
---


In [None]:
%%html
<style>
.output_wrapper, .output {
    height:auto !important;
    max-height:350px;  /* your desired max-height here */
}
.output_scroll {
    box-shadow:none !important;
    webkit-box-shadow:none !important;
}
</style>

In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# 1.Basic Pandas Reading Method <a id="1"> </a> 
 ---
 [**Go to top**](#00)
 
 ![](https://python-graph-gallery.com/wp-content/uploads/Pandas_Cheat_Datacamp.png)
 ![](https://ugoproto.github.io/ugo_py_doc/img/scipy_cs/Pandas_Cheat_Sheeta.png)
 ![](https://cdn-images-1.medium.com/max/2000/1*YhTbz8b8Svi22wNVvqzneg.jpeg) 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)
warnings.simplefilter(action='ignore', category=UserWarning)

In [None]:
df = pd.read_csv("../input/datasetsdifferent-format/IMDB.csv", encoding="ISO-8859-1")
df.head()

# 2.Advance Pandas Reading Methods <a id="2"></a>
---
 [**Go to top**](#00)
 
 ![](https://i.stack.imgur.com/qCOaK.png)
 
* [**Advance Pandas Reading Methods**](#2)
    * [**Manipulating Column & Index Locations and Names**](#21)
    * [**Data Parsing options**](#22)
    * [**Reading data from excel files**](#23)
    * [**Reading data from some other popular formats**](#24)

> ### 2.1 Manipulating Columns & Index Location and Names <a id="21"></a>

### 1. No Header and No Columns
* There is **no header*** and **no columns** while reading csv file here and used `encoding` because file in `ISO-8859-1` format

In [None]:
df = pd.read_csv('../input/datasetsdifferent-format/IMDB.csv', encoding = "ISO-8859-1", header=None)
df.head()

### 2.Specify a different row as header
* Read specific **rows as header** which is working as **column name**
* In the result, row 2 become header of dataframe

In [None]:
df = pd.read_csv("../input/datasetsdifferent-format/IMDB.csv", encoding = "ISO-8859-1", header=2)
df.head()

### 3.Specify a column as index
* Read specify column as index for dataframe
* In the result, title column become Index column of dataframe

In [None]:
df = pd.read_csv("../input/datasetsdifferent-format/IMDB.csv", encoding = "ISO-8859-1", index_col="Title")
df.head()

### 4.Choose only a subset of columns to be read
* Subset specific columns from the dataframe while reading file
* In the result, subset the ` Title, Genre1, Genre2, Budget` columns

In [None]:
df = pd.read_csv("../input/datasetsdifferent-format/IMDB.csv", encoding = "ISO-8859-1", usecols=['Title','Genre1','Genre2','Budget'])
df.head()

### 5.Handling Missing and NA data

***Missing Value format :***  NaN: ”, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘nan’`.

* Handling Missing value while reading data.
* In the result, dataframe handle the result which contain `nan` kind missing value

In [None]:
df = pd.read_csv('../input/datasetsdifferent-format/IMDB.csv', encoding = "ISO-8859-1", na_values=['nan'])
display(df.head())
print(df.shape)

### 6.Choose whether to skip over blank rows or not

* you choose whether to skip over blank rows while reading data
* In the result, you can see that we have skipped blank rows.


In [None]:
df = pd.read_csv('../input/datasetsdifferent-format/IMDB.csv', encoding = "ISO-8859-1", skip_blank_lines=False)
df.head()

> ### 2.2 Data Parsing options <a id="22"> </a>

### 1. Skip Rows
* We can skip the rows by reading the dataset
* In the result, you can see that row number `1,3,7` are skipped from the dataframe.

In [None]:
df = pd.read_csv('../input/datasetsdifferent-format/IMDB.csv', encoding = "ISO-8859-1", skiprows = [1,3,7])
df.head()

### 2.Skip rows from footer or from end of the file

* We can skip the rows from the footer.
* In the result, 

In [None]:
df.tail(2)
print("After Skipping the Rows")
df = pd.read_csv('../input/datasetsdifferent-format/IMDB.csv', encoding = "ISO-8859-1", skipfooter=2, engine='python')
df.tail(2)

### 3.Reading only a subset of the file or a certain number of rows

* We are also Reading only a subset of the file or a certain number of rows while reading whole dataset file.
* In the result, we can see the shape of the data before and after.

In [None]:
print("Before Shape:",df.shape)
print("After Selecting 100 Rows")
df = pd.read_csv('../input/datasetsdifferent-format/IMDB.csv', encoding = "ISO-8859-1", nrows=100)
print("After Shape:",df.shape)

> ### 2.3.Reading data from excel files <a id="23"></a>

### 1.Basic Excel read
* Basic Excel file reading with default sheet number

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx')
df.head()

## Advanced read options

`pandas.read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, names=None, parse_cols=None, parse_dates=False, date_parser=None, na_values=None, thousands=None, convert_float=True, has_index_names=None, converters=None, dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)`

***Reference:*** [Pandas Doc](http://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.read_excel.html)

### 2.Which Sheet to read?
* We can select which sheet which we have to read.

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name=0)
df.head()

### 3.Reading data from multiple sheets in an excel file
* Find out the sheet list of the excel file

In [None]:
df_excel = pd.ExcelFile('../input/datasetsdifferent-format/IMDB.xlsx')
df_excel.sheet_names

In [None]:
df1 = df_excel.parse('movies')
df2 = df_excel.parse('by genre')
df1.head()
df2.head()

### 4.Choose Header or column labels

* we can also select header or columns labels from the `read_excel()`  function

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name=1, header=3)
df.head()

### 5.No header
* We can set `header = None` for not seeing header

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name=1, header=None)
df.head()

### 6.Skip Rows at the beginning of the file
* Skip the rows

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name=1, skiprows=7)
df.head(10)

### 7.Skip rows from the end of the file

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name=1, ski_footer=10)
df.tail(10)

### 8.Choose Columns
* we can choose column from the excel file

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name= 0, usecols=2)
df.head()

### 9.Column Names

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name=0, usecols = 2, names=['X','Title', 'Rating'], )
df.head()

### 10.Set an Index while reading data

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name=0, index_col='Title')
df.head()

### 11.Handle missing data while reading

In [None]:
df = pd.read_excel('../input/datasetsdifferent-format/IMDB.xlsx', sheet_name= 0, na_values=['nan']) ## as per missing value
df.head()

> ### 2.4.Reading data from some other popular formats <a id="24"></a>
### 1.Reading JSON data into Pandas

In [None]:
movies_json = pd.read_json('../input/datasetsdifferent-format/IMDB.json')
movies_json.head()

### 2.Reading HTML data

In [None]:
df = pd.read_html('../input/datasetsdifferent-format/IMDB.html')
# df

### 3.Read pickle file

In [None]:
df = pd.read_pickle('../input/datasetsdifferent-format/IMDB.p')
df.head()

### 4.Read SQL file

In [None]:
import sqlite3

In [None]:
conn = sqlite3.connect("../input/datasetsdifferent-format/IMDB.sqlite")
df = pd.read_sql_query("SELECT * FROM IMDB;", conn)
df.head()

### 5.Read data from clipboard

In [None]:
# df = pd.read_clipboard()
# # df.head()

# 3.Apply multiple filter criteria to a pandas DataFrame<a id="3"></a>
---
 [**Go to top**](#00)
 
 ![](https://docs.microsoft.com/en-us/dynamics365/customer-engagement/social-engagement/media/data-set-concept-social-engagement.png)
 ### In this section, you will learn
1. Filter using `&` **AND Operator.**
1. Filter using `|`  **OR Operator.**
1. Filtering using *`isin`* **method**
1. Using ***`isin` method*** with multiple conditions
 
###  1.Read in the dataset

In [None]:
data_zillow = pd.read_table('../input/datasetsdifferent-format/data-zillow.csv', sep=',')
data_zillow.head()

### 2. FIlter Based on Multiple Condition

In [None]:
data_zillow[(data_zillow['Zhvi'] > 1000000) & (data_zillow['State'] == 'NY')].head()

In [None]:
data_zillow[((data_zillow['State'] == 'CA') | (data_zillow['State'] == 'NY'))].head()

In [None]:
zillow_filter = data_zillow['Metro'].isin(['New York','San Diego'])
data_zillow[zillow_filter].head()

In [None]:
zillow_filter1 = data_zillow.isin({'State': ['CA'], 'Metro': ['San Francisco']})
data_zillow[zillow_filter1].head()

# 4.Changing the datatype of a Pandas Series <a id="4"></a>
---
[**Go to Top**](#00)

![](https://cdn-images-1.medium.com/max/1600/1*oErPCXv1PFcuuizXqGEEbw.png)
### In this section you will learn
1. Changes Data int to float
2. Changing datatype while reading data
3. Converting string to datetime


#### 1.Read Dataset

In [None]:
data_zillow = pd.read_table('../input/datasetsdifferent-format/data-zillow.csv', sep=',')
data_zillow.head()

#### 2. Changes Data int to float

In [None]:
data_zillow.dtypes

In [None]:
data_zillow['Zhvi'] = data_zillow.Zhvi.astype(float)

In [None]:
data_zillow.dtypes

### 3.Changing datatype while reading data
* By using `dtype` parameter in reading function we can change data types of any column as per below example

In [None]:
data_zillow1 = pd.read_csv('../input/datasetsdifferent-format/data-zillow.csv', sep=',', dtype={'Zhvi':float})
data_zillow1.dtypes

### 4.Converting string to datetime
* we can also change *`date`* data type by using `pd.to_datetime()`

In [None]:
pd.to_datetime(data_zillow1.Date,infer_datetime_format=True).head()

# 5.Filter rows of a pandas DataFrame by column value <a id="5"></a>
---
 [**Go to top**](#00)

![](http://104.236.88.249/wp-content/uploads/2016/10/Pandas-selections-and-indexing.png)

### In this section, you will learn
1. Filtering Method by using `filter()`
2. Filtering Method by Regular expression in `filter()` function
3. Filter data using boolean indexing
4. An alternative way to filter

#### 1. Read Dataset

In [None]:
data = pd.read_table('../input/datasetsdifferent-format/data-zillow.csv', sep=',')
data.head()

#### 2.Filter columns by Different Ways
* Filtering Method by using `filter()`
* Filtering Method by Regular expression in `filter()` function
* Filter data using boolean indexing
* An alternative way to filter

In [None]:
filtered_data = data.filter(items=['State', 'Metro'])
filtered_data.head(6)

#### 3.Filter columns by regular expression using filter()

In [None]:
filtered_data = data.filter(regex='Region', axis=1)
filtered_data.head()

#### 4.Filter data using boolean indexing

In [None]:
price_filter_series = data['Zhvi'] > 500000
price_filter_series.head()

In [None]:
data[price_filter_series].head()

#### 5.An alternative way to filter

In [None]:
data[data.Zhvi >= 1000000].head()

# 6.Selecting multiple rows and columns from a pandas DataFrame <a id="6"> </a>
---
 [**Go to top**](#00)
 
 
### In this Section you can learn:

1. Select single row, single column
1. Select single row, multiple columns
1. Select single row, all columns
1. Select multiple rows, single column
1. Select multiple rows and multiple contiguous columns
1. Select multiple rows and multiple non-contiguous columns
1. Select multiple rows and all columns
1. Select non-contiguous rows
1. Selecting rows based on a specific column's value
1. Selecting all rows for a specific column based on a value of another column

### 1.Read dataset

In [None]:
data_zillow = pd.read_table('../input/datasetsdifferent-format/data-zillow.csv', sep=',')
data_zillow.head()

### 2.Select single row, single column

In [None]:
data_zillow.loc[7, 'Metro']

In [None]:
data_zillow.iloc[7,4]

### 3.Select single row, multiple columns

In [None]:
data_zillow.loc[7, ['Metro', 'County']]

In [None]:
data_zillow.iloc[7, [4,5]]

### 4.Select single row, all columns

In [None]:
data_zillow.loc[11, :]

### 5.Select multiple rows, single column

In [None]:
data_zillow.loc[101:105, 'Metro']

### 6.Select multiple rows and multiple contiguous columns

* **In `loc`**  we pass the column label to fetch data.
* **In `iloc`**  we pass the number to fetch data.

In [None]:
data_zillow.loc[201:204, "State":"County"]

In [None]:
data_zillow.iloc[201:205, 3:6]

### 7.Select multiple rows and multiple non-contiguous columns

In [None]:
data_zillow.loc[201:205, ['RegionName', 'State']]

### 8.Select multiple rows and all columns

In [None]:
data_zillow.loc[201:205, :]

### 9.Select non-contiguous rows

In [None]:
data_zillow.loc[[0,5,10], :]

### 10.Selecting rows based on a specific column's value

In [None]:
data_zillow.loc[data_zillow.County=="Queens"]

### 11.Selecting all rows for a specific column based on a value of another column

In [None]:
data_zillow.loc[data_zillow.Metro=="New York", "County"].head()

# 7.Sorting a pandas DataFrame or a Series <a id="7"></a>
---
[**Go to top**](#00)

![](https://www.notquitesusie.com/wp-content/uploads/2012/10/farmers-market-coloring-sorting-set.jpg)

### In this section you can learn:

1. Simple sort
1. Changing the sort order
1. Sort by more than one column
1. Sort by multiple columns and mixed ascending order
1. Sort a Series

### 1.Read dataset

In [None]:
data_zillow = pd.read_table('../input/datasetsdifferent-format/data-zillow.csv', sep=',')
data_zillow.head()

### 2.Simple sort
* Sort the value by using column name

In [None]:
data_zillow.sort_values('Metro').head()

### 3.Changing the sort order
* Sorting the value basis on the descending order

In [None]:
sorted = data_zillow.sort_values('Metro', ascending=False)
sorted.head()

### 4.Sort by more than one column

In [None]:
sorted = data_zillow.sort_values(by=['Metro','County'])
sorted.head()

### 5.Sort by multiple columns and mixed ascending order

In [None]:
sorted = data_zillow.sort_values(by=['Metro','County', 'Zhvi'], 
                            ascending=[True, True, False])
sorted.head()

### 6.Sort a Series

* 1.Let's create a Series object

In [None]:
regions = data_zillow.RegionID
type(regions)

**Let's sort the series¶**
* **1.Original Series**

In [None]:
regions.head()

* **2.Sorted**

In [None]:
regions.sort_values().head()

# 8.Using pandas Series data structure to select a subset of the data <a id="8"></a>
---
[**Go to top**](#00)

![](https://image.slidesharecdn.com/talk-120111102959-phpapp01/95/a-look-inside-pandas-design-and-development-23-728.jpg)

### In this Section, you will learn below topics

1. Select data
    * Select a Series with bracket notation
2. DataFrame vs Series
    * Multi Column Selection - Series or DataFrame
    * Select using dot notation
3. Creating a new series by selection

### 1.Read Dataset

In [None]:
data = pd.read_table('../input/datasetsdifferent-format/data-zillow.csv', sep=',')

In [None]:
data.head()

### 2.Select data
* **Select a Series with bracket notation**

In [None]:
regions = data['RegionName']
type(regions)

In [None]:
regions.head()

### 3.DataFrame vs Series
* **Multi Column Selection - Series or DataFrame**

In [None]:
region_n_state = data[['RegionName', 'State']]
region_n_state.head()

In [None]:
type(region_n_state)

* **Select using dot notation**

In [None]:
data.State.head()

### 4.Creating a new series by selection

In [None]:
data['Address'] = data.County + ', ' + data.Metro + ', ' + data.State

In [None]:
data.Address.head()

# 9.Using string methods in pandas <a id="9"></a>
---
[**Go To Top**](#00)

### In this section, you will learn
1. Check for a substring
2. Make values of a series or column uppercase
3. Make values lowercase
4. Get the length of each value in a column
5. Remove all whitespace from the beginning
6. Replace parts of a column's values

### 1. Read dataset

In [None]:
data = pd.read_table('../input/datasetsdifferent-format/data-zillow.csv', sep=',')
data.head()

### 2.Check for a substring

In [None]:
data.RegionName.str.contains('New').head()

### 3.Make values of a series or column uppercase

In [None]:
data.RegionName.str.upper().head()

### 4.Make values lowercase


In [None]:
data.RegionName.str.lower().head()

### 5.Get the length of each value in a column

In [None]:
data.County.str.len().head()

### 6.Remove all whitespace from the beginning

In [None]:
data.RegionName.str.lstrip().head()

### 7.Replace parts of a column's values

In [None]:
data.RegionName.str.replace(' ', '').head()

# 10.Using the axis parameter in pandas<a id="10"></a>
---
[**Go To TOP**](#00)

![](https://www.dataquest.io/blog/content/images/2017/12/axis_diagram.jpg)

### In this section, you can learn

1. Usage of axis parameter
2. axis usage examples
    * axis = 0
    * axis = 1
    * use labels instead of 0 and 1

### 1.Read Dataset

In [None]:
data = pd.read_table('../input/datasetsdifferent-format/data-zillow.csv', sep=',')
data.head()

### 2.Usage of axis parameter

In [None]:
data.head()

In [None]:
data.axes

### 1.**axis = 0**

In [None]:
data.mean(axis=0)

### 2.axis = 1

In [None]:
data.mean(axis=1).head()

### 3.use labels instead of 0 and 1

In [None]:
data.mean(axis='rows')

In [None]:
data.mean(axis='columns').head()

In [None]:
data.drop(0, axis=0).head()

In [None]:
data.drop('Date', axis=1).head()

In [None]:
data.drop('Date', axis=1).head()

# 11.Applying a function to a pandas Series or DataFrame<a id="11"></a>
---
[**Go To TOP**](#00)

![](https://i.stack.imgur.com/AqYhv.png)

### In this section, you can learn

1. Apply functions using apply()
2. Apply functions using applymap()
3. Applying our own functions

### 1.Read dataset

In [None]:
data = pd.read_csv('../input/datasetsdifferent-format/data-titanic.csv')
data.head()

### 2.Apply functions using apply()

In [None]:
func_lower = lambda x: x.lower()
data.Name.apply(func_lower).head()

### 3.Apply functions using applymap()

In [None]:
data[['Age', 'Pclass']].applymap(np.square).head()

### 3.Applying our own functions

In [None]:
def my_func(i):
    return i + 20

In [None]:
data[['Age', 'Pclass']].applymap(my_func).head()

# 12.Handling SettingWithCopyWarning<a id="12"></a>
---
[**Go To TOP**](#00)

![](https://www.dataquest.io/blog/content/images/view-vs-copy.png)

### In this section, you can learn

1. A SettingWithCopyWarning scenario
2. Handling the SettingWithCopyWarning

### 1.A SettingWithCopyWarning scenario

In [None]:
data[data.Age.isnull()].Age = data.Age.mean()

### 2.Handling the SettingWithCopyWarning

In [None]:
data[data.Age.isnull()].Age.head()

In [None]:
data.loc[data.Age.isnull(), 'Age'] = data.Age.mean

In [None]:
data[data.Age.isnull()]

# 13.Handling missing values in pandas<a id="13"></a>
---
[**Go To TOP**](#00)

![](https://cdn-images-1.medium.com/max/1600/1*_RA3mCS30Pr0vUxbp25Yxw.png)

### In this section, you can learn

1. Missing Records 
    1. Find out total records in the dataset
    1. Number of valid records per column
2. Dropping missing records
    1. Drop all records that have one or more missing values
    1. Drop only those rows that have all records missing
3. Fill in missing data
    1. Fill in missing data with zeros
    1. Fill in missing data with a mean of the values from other rows

In [None]:
data = pd.read_csv("../input/datasetsdifferent-format/data-titanic.csv")

### 1. Missing Records 
1. **Find out total records in the dataset**

In [None]:
data.shape

2. **Number of valid records per column**

In [None]:
data.count()

### 2. Dropping missing records

1. **Drop all records that have one or more missing values**


In [None]:
data_missing_dropped = data.dropna()
data_missing_dropped.shape

2. **Drop only those rows that have all records missing**

In [None]:
data_all_missing_dropped = data.dropna(how="all")
data_all_missing_dropped.shape

### 3. Fill in missing data
    
1. **Fill in missing data with zeros**

In [None]:
data_filled_zeros =  data.fillna(0)
data_filled_zeros.count()

2. **Fill in missing data with a mean of the values from other rows**

In [None]:
data_filled_in_mean = data.copy()
data_filled_in_mean.Age.fillna(data.Age.mean(), inplace=True)
data_filled_in_mean.count()

# 14.Indexing in pandas dataframes<a id="14"></a>
---
[**Go To TOP**](#00)

![](https://bookdata.readthedocs.io/en/latest/_images/base_01_pandas_5_0.png)

### In this section, you can learn

1. Default Index
2. Set an Index post reading of data
3. Set an Index while reading data
4. Selection using Index
5. Reset Index

In [None]:
data = pd.read_csv('../input/datasetsdifferent-format/data-titanic.csv')

### 1.Default Index

In [None]:
data.head()

### 2. Set an Index post reading of data

In [None]:
data.set_index('Name').head()

### 3. Set an Index while reading data

In [None]:
data = pd.read_csv('../input/datasetsdifferent-format/data-titanic.csv', index_col=3)
data.head()

### 4. Selection using Index

In [None]:
data.loc['Braund, Mr. Owen Harris',:]

### 5. Reset Index

In [None]:
data.reset_index(inplace=True)

In [None]:
data.head()

# 15.Merging and concatenating multiple data frames into one<a id="15"></a>
---
[**Go To TOP**](#00)

![](https://cdn-images-1.medium.com/max/1600/1*uG1vjoSQj7gMm8craCj2xA.png)

### In this section, you can learn

1. Concatenate Dataset DataFrames
2. Concatenate using append()
3. Concatenate on columns
4. Merging DataFrames
5. Left outer merge
6. Right outer merge
7. Full outer merge

### 1. Concatenate Dataset DataFrames


In [None]:
dataset1 = pd.DataFrame({'Age': ['32', '26', '29'],
                         'Sex': ['F', 'M', 'F'],
                         'State': ['CA', 'NY', 'OH']},
                         index=['Jane', 'John', 'Cathy'])
    
dataset2 = pd.DataFrame({'Age': ['34', '23', '24', '21'],
                         'Sex': ['M', 'F', 'F', 'F'],
                         'State': ['AZ', 'OR', 'CA', 'WA']},
                         index=['Dave', 'Kris', 'Xi', 'Jo'])

In [None]:
pd.concat([dataset1, dataset2])

### 2. Concatenate using append()

In [None]:
dataset1.append(dataset2)

### 3. Concatenate on columns


In [None]:
dataset1 = pd.DataFrame({'Age': ['32', '26', '29'],
                         'Sex': ['F', 'M', 'F'],
                         'State': ['CA', 'NY', 'OH']},
                         index=['Jane', 'John', 'Cathy'])

dataset2 = pd.DataFrame({'City': ['SF', 'NY', 'Columbus'],
                         'Work Status': ['No', 'Yes', 'Yes']},
                         index=['Jane', 'John', 'Cathy'])


pd.concat([dataset1, dataset2], axis=1)

### 4. Merging DataFrames

In [None]:
dataset1 = pd.DataFrame({'Name': ['Jane', 'John', 'Cathy', 'Sarah'],
                         'Age': ['32', '26', '29', '23'],
                         'Sex': ['F', 'M', 'F', 'F'],
                         'State': ['CA', 'NY', 'OH', 'TX']})

dataset2 = pd.DataFrame({'Name': ['Jane', 'John', 'Cathy', 'Rob'],
                        'City': ['SF', 'NY', 'Columbus', 'Austin'],
                         'Work Status': ['No', 'Yes', 'Yes', 'Yes']})

pd.merge(dataset1, dataset2, on='Name', how='inner')

### 5. Left outer merge

In [None]:
pd.merge(dataset1, dataset2, on='Name', how='left')

### 6. Right outer merge

In [None]:
pd.merge(dataset1, dataset2, on='Name', how='right')

### 7. Full outer merge

In [None]:
pd.merge(dataset1, dataset2, on='Name', how='outer')

# 16.Modifying a Pandas Dataframe inplace<a id="16"></a>
---
[**Go To TOP**](#00)

### In this section, you can learn

1. Modify without inplace
2. Modify inplace
3. inplace not required for very method

In [None]:
top_movies = pd.read_table('../input/datasetsdifferent-format/data-movies-top-grossing.csv', sep=',')

In [None]:
top_movies.head()

### 1.Modify without inplace

In [None]:
top_movies.set_index('Rank').head()

In [None]:
top_movies.head()

### 2.Modify inplace

In [None]:
top_movies.set_index('Rank', inplace=True)

In [None]:
top_movies.head()


### 3.inplace not required for very method

In [None]:
top_movies.rename(columns = {'Year': 'Release Year'}).head()

# 17.Removing columns from a pandas DataFrame <a id="17"></a>
---
[**Go To TOP**](#00)

![](https://i1.wp.com/cmdlinetips.com/wp-content/uploads/2018/04/How_To_Drop_Columns_in_Pandas.jpg)

### In this section, you can learn

1. Remove one column
2. Remove more than one column
3. Remove row(s)

### 1.Remove one column

In [None]:
data = pd.read_csv('../input/datasetsdifferent-format/data-titanic.csv', index_col=3)
data.head()

In [None]:
data.drop('Ticket', axis=1, inplace=True)

In [None]:
data.head()

### 2.Remove more than one column

In [None]:
data.drop(['Parch', 'Fare'], axis=1, inplace=True)
data.head()

### 3.Remove row(s)

In [None]:
data.drop(['Braund, Mr. Owen Harris', 'Heikkinen, Miss. Laina'], inplace=True)

In [None]:
data.head()

# 18.Renaming columns in a pandas DataFrame <a id="18"></a>
---
[**Go To TOP**](#00)

![](https://image.slidesharecdn.com/datamanagementinpython-170925110242/95/data-management-in-python-19-638.jpg)

### In this section, you can learn

1. Rename columns while reading the data
2. Rename columns using rename method 
    1. Read in the dataset again 
    2. Rename
3. Rename all columns

### 1.Rename columns while reading the data

In [None]:
list_columns = ['Date', 'Region ID', 'Region Name', 'State',
             'City', 'County', 'Size Rank','Price']
data = pd.read_csv('../input/datasetsdifferent-format/data-zillow1.csv', names = list_columns)
data.head()

### 2.Rename columns using rename method
1. **Read in the dataset again**

In [None]:
data = pd.read_csv('../input/datasetsdifferent-format/data-zillow1.csv')
data.head()

2. **Rename**

In [None]:
data.columns

In [None]:
data.rename(columns={'RegionName':'Region', 'Metro':'City'}, inplace=True)

In [None]:
data.columns

### 3.Rename all columns

In [None]:
data.columns = ['Date', 'Region ID', 'Region Name', 'State',
             'City', 'County', 'Size Rank','Price']

# 19.Using groupby method <a id="19"></a>
---
[**Go To TOP**](#00)

![](https://i.stack.imgur.com/sgCn1.jpg)

### In this section, you can learn

1. Get Mean price for every State
2. Split the data into groups
3. Apply a function on each group and combine the results
4. Get Descriptive statistics by Groups(States)
5. Group by data on State and Region
6. Get the number of records per State
7. Group by Columns
8. Iterate over Groups

In [None]:
data = pd.read_csv('../input/datasetsdifferent-format/data-zillow1.csv')
data.head()

### 1.Get Mean price for every State

In [None]:
grouped_data = data[['State', 'Price']].groupby('State').mean()
grouped_data.head()

### 2.Split the data into groups

In [None]:
grouped_data = data[['State', 'Price']].groupby('State')
grouped_data.head(2)

### 3.Apply a function on each group and combine the results

In [None]:
grouped_data.mean().head()

### 4.Get Descriptive statistics by Groups(States)

In [None]:
grouped_data.describe().head()

### 5.Group by data on State and Region

In [None]:
grouped_data = data[['State',
                     'RegionName', 
                     'Price']].groupby(['State','RegionName']).mean()
grouped_data.head()

### 6.Get the number of records per State

In [None]:
grouped_data = data.groupby(['State']).size()
grouped_data.head()

### 7.Group by Columns

In [None]:
grouped_data = data.groupby(data.dtypes, axis=1)
# list(grouped_data)

### 8.Iterate over Groups

In [None]:
# for state, grouped_data in data.groupby('State'):
#     print(state, '\n', grouped_data)

# 20.Work with dates and times data <a id="20"></a>
---
[**Go To TOP**](#00)

![](https://i.stack.imgur.com/Zfni3.jpg)

### In this section, you can learn

1. let's first convert our date column to datetime
2. Let's set index to the date column
3. Filter and select time series Data
4. Get properties of date-time series data

### 1.let's first convert our date column to datetime

In [None]:
dataset = pd.DataFrame({'DOB': ['1976-06-01', '1980-09-23', '1984-03-30', '1991-12-31', '1994-10-2', '1973-11-11'],
                        'Sex': ['F', 'M', 'F', 'M', 'M', 'F'],
                        'State': ['CA', 'NY', 'OH', 'OR', 'TX', 'CA'],
                        'Name': ['Jane', 'John', 'Cathy', 'Jo', 'Sam', 'Tai']})
dataset

In [None]:
dataset.dtypes

In [None]:
dataset.DOB = pd.to_datetime(dataset.DOB)

In [None]:
dataset.dtypes

### 2.Let's set index to the date column

In [None]:
dataset.set_index('DOB', inplace=True)

In [None]:
dataset

### 3.Filter and select time series Data

In [None]:
dataset['1980']

In [None]:
dataset['1980':]

In [None]:
dataset[:'1980']

In [None]:
display(dataset['1980':'1984'])
dataset.reset_index(inplace=True)

### 4.Get properties of date-time series data

In [None]:
dataset.DOB.dt.dayofyear

In [None]:
dataset.DOB.dt.weekday_name

# 21.Choosing the colors for the plots <a id="211"></a>
---
[**Go To TOP**](#00)

![](https://i.stack.imgur.com/dLUh4.png)

### In this section, you can learn

1. Color Palettes
2. Look at how these colors look on a plot
3. Change the color palette
4. Impact on the plot
5. seaborn palettes
6. matplotlib colormaps as color palettes
7. Let's set the palette to a matplotlib colormap
8. Impact on the plot
9. Building custom color palettes
10. Let's see how the plot has changed

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns

In [None]:
df = pd.read_csv('../input/datasetsdifferent-format/data-alcohol.csv')
df.head()

### 1. Color Palettes

In [None]:
sns.palplot(sns.color_palette())

### 2.Look at how these colors look on a plot

In [None]:
plt.figure(figsize = (15,8))
sns.set()
sns.boxplot(data=df);

### 3. Change the color palette

In [None]:
sns.set_palette("bright")

### 4. Impact on the plot

In [None]:
plt.figure(figsize = (15,8))
sns.boxplot(data=df);

### 5. seaborn palettes

In [None]:
sns.palplot(sns.color_palette("deep", 7))
sns.palplot(sns.color_palette("muted", 7))
sns.palplot(sns.color_palette("pastel", 7))

sns.palplot(sns.color_palette("bright", 7))
sns.palplot(sns.color_palette("dark", 7))
sns.palplot(sns.color_palette("colorblind", 7))

### 6. matplotlib colormaps as color palettes

In [None]:
sns.palplot(sns.color_palette("RdBu", 7))
sns.palplot(sns.color_palette("Blues_d", 7))

### 7. Let's set the palette to a matplotlib colormap

In [None]:
sns.set_palette("Blues_d")

### 8. Impact on the plot

In [None]:
plt.figure(figsize = (15,8))
sns.boxplot(data=df);

### 9. Building custom color palettes

In [None]:
my_palette = ['#4B0082', '#0000FF', '#00FF00', '#FFFF00', '#FF7F00', '#FF0000']
sns.set_palette(my_palette)
sns.palplot(sns.color_palette())

### 10. Let's see how the plot has changed

In [None]:
plt.figure(figsize = (15,8))
sns.boxplot(data=df);

# 22.Controlling plot aesthetics <a id="221"></a>
---
[**Go To TOP**](#00)

![](https://tgmstat.files.wordpress.com/2013/11/tips1.png)

### In this section, you can learn

1. First plot with seaborn
2. Changing the plot style with set_style
	1. Set plot background to a white grid
	1. Set the plot background to dark
	1. Set the background to white
	1. Adding 'ticks
3. Customizing the styles
	1. Style parameters
4. Plotting Context Presets
	1. Plotting Context Preset - paper
	1. Plotting Preset - talk
	1. Plotting Preset - poster

### 1. First plot with seaborn

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns
df = pd.read_csv('../input/datasetsdifferent-format/data-alcohol.csv')
df.head()

In [None]:
sns.distplot(df.beer_servings)

### 2. Changing the plot style with set_style
#### 1. Set plot background to a white grid

In [None]:
sns.set()
sns.set_style("whitegrid")
sns.lmplot(x='beer_servings', y='wine_servings', data=df);

#### 2. Set the plot background to dark

In [None]:
sns.set()
sns.set_style("dark")
sns.lmplot(x='beer_servings', y='wine_servings', data=df, fit_reg=False);

#### 3.Set the background to white

In [None]:
# sns.set()
sns.set_style("white")
plt.figure(figsize=(15,8))
sns.swarmplot(x='country', y='wine_servings', data=df);

#### 4.Adding 'ticks

In [None]:
plt.figure(figsize=(15,8))
sns.set_style("ticks")
sns.boxplot(data=df);

### 3.Customizing the styles
#### 1.Style parameters

In [None]:
sns.axes_style()

In [None]:
plt.figure(figsize=(15,8))
sns.set_style("ticks", {"axes.facecolor": ".1"})
sns.boxplot(data=df);

### 4.Plotting Context Presets
#### 1.Plotting Context Preset - paper

In [None]:
sns.set()
sns.set_context("paper")
plt.figure(figsize=(15, 8))
sns.lmplot(x='beer_servings', y='wine_servings', data=df);

#### 2.Plotting Preset - talk

In [None]:
sns.set()
sns.set_context("talk")
plt.figure(figsize=(8, 6))
sns.lmplot(x='beer_servings', y='wine_servings', data=df);

#### 3.Plotting Preset - poster

In [None]:
sns.set()
sns.set_context("poster")
plt.figure(figsize=(8, 6))
sns.lmplot(x='beer_servings', y='wine_servings', data=df);

# 23.Plotting categorical data <a id="231"></a>
---
[**Go To TOP**](#00)

![](https://i.stack.imgur.com/IsxzL.png)

### In this section, you can learn

1. Scatterplots
2. Swarmplot
3. Boxplot
4. Violinplot
5. Barplot
6. Countplot
7. Wide form plots

#### 1.Scatterplots

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns

In [None]:
df = pd.read_csv('../input/datasetsdifferent-format/data_simpsons_episodes.csv')
df.head()

In [None]:
plt.figure(figsize=(20,8))
sns.stripplot(x="season", y="us_viewers_in_millions", data=df);

#### 2.Swarmplot

In [None]:
plt.figure(figsize=(20,8))
sns.swarmplot(x="season", y="us_viewers_in_millions", data=df);

#### 3.Boxplot

In [None]:
plt.figure(figsize=(20,8))
sns.boxplot(x="season", y="us_viewers_in_millions", data=df);
# sns.boxenplot(x="season", y="us_viewers_in_millions", data=df);

#### 4.Violinplot

In [None]:
plt.figure(figsize=(20,8))
sns.violinplot(x="season", y="us_viewers_in_millions", data=df);

#### 5.Barplot

In [None]:
plt.figure(figsize=(20,8))
sns.barplot(x="season", y="us_viewers_in_millions", data=df);

#### 6.Count Plot

In [None]:
plt.figure(figsize=(20,8))
sns.countplot(x="season", data=df);

#### 7.Wide form plot

In [None]:
df = pd.read_csv('../input/datasetsdifferent-format/data-alcohol.csv')
df.head()

In [None]:
plt.figure(figsize=(20,8))
sns.boxplot(data=df, orient="h");

# 24.Plotting with data aware grids <a id="241"></a>
---
[**Go To TOP**](#00)

![](https://i.stack.imgur.com/YsSZc.png)

### In this section, you can learn

1. Plotting with FacetGrid()
2. Plotting with PairGrid()
	1. MLB Players Height, Weight, Age and Positions dataset
3. Plot it with PairGrid()
4. Plotting with PairPlot()

### 1. Plotting with FacetGrid()

In [None]:
df = pd.read_csv('../input/datasetsdifferent-format/data-titanic.csv')
df.head()

In [None]:
g = sns.FacetGrid(df, col="Sex", hue='Survived')
g.map(plt.hist, "Age");
g.add_legend();

### 2.Plotting with PairGrid()
#### MLB Players Height, Weight, Age and Positions dataset

In [None]:
mlb = pd.read_csv('../input/datasetsdifferent-format/data-mlb-players.csv')
mlb.head()

### 3.Plot it with PairGrid()

In [None]:
g = sns.PairGrid(mlb, vars=["Height", "Weight"], hue="Position")
g.map(plt.scatter);
g.add_legend();

### 4.Plot it with PairGrid()

In [None]:
sns.pairplot(mlb, hue="Position", size=2.5);

### <span style="color:orange">Thanks for Reading this notebook...🙏...🙏...🙏!!!</span>