# Pandas Tutorial 

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language

### Why is Pandas used?

Pandas is mainly used for data analysis. Pandas allows importing data from various file formats such as comma-separated values, JSON, SQL, Microsoft Excel. Pandas allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features.

### Objectives of this notebook 

What is Data Frames?<br>
What is Data Series?<br>
Different operation in Pandas

In [None]:
# Import the required libraries
import numpy as np
import pandas as pd

In [None]:
df = pd.DataFrame(np.arange(0,20).reshape(4,5), index=['row1','row2','row3','row4'],columns=['col1','col2','col3','col4','col5'])

In [None]:
df

In [None]:
type(df)

In [None]:
#Dataframe consists of multiple rows and columns. 
#If there is a single row or a single column then it is referred as Series

In [None]:
df.loc['row1']
#loc: Access a group of rows and columns by label(s) or a boolean array.

In [None]:
type(df.loc['row1'])

In [None]:
df['col1']

In [None]:
type(df['col1'])

In [None]:
df.iloc[:,:] 
#iloc: Purely integer-location based indexing for selection by position.

In [None]:
df.iloc[1,1] #Returns value of row2, col2

In [None]:
df.iloc[:,2] #Returns all the rows of col3

In [None]:
df.iloc[:3,2:] #Returns all rows upto 3 and all cols starting from 2

In [None]:
# Convert dataframe into array

df.iloc[:3,2:].values

In [None]:
type(df.iloc[:3,2:].values)

In [None]:
df['col1'].value_counts() #Return a Series containing counts of unique rows in the DataFrame.

In [None]:
df=pd.read_csv('../input/vehicle-dataset-from-cardekho/car data.csv')

In [None]:
# Top 5 records of the dataset
df.head()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df['Fuel_Type'].value_counts()

In [None]:
df[df['Kms_Driven']>90000]

In [None]:
df.corr()

In [None]:
lst = [[11,22,33],[66,77,np.nan],[44,np.nan,55],[88,np.nan,np.nan],[99,22,77]]

In [None]:
df = pd.DataFrame(lst)

In [None]:
df

### Handling missing values 

Drop nan values

In [None]:
df.dropna(axis=0)

In [None]:
df.dropna(axis=1)

In [None]:
new_df = df.reindex([0,'a',1,'b',2,'c',3,'d',4])

In [None]:
new_df

In [None]:
pd.isna(new_df[1])

In [None]:
new_df[2].notna()

In [None]:
new_df.fillna('Missing')

### CSV

Reading files from different sources

In [None]:
from io import StringIO, BytesIO

In [None]:
data = ('col1,col2,col3\n''x,y,1\n''p,q,2\n''a,b,3')

In [None]:
type(data)

In [None]:
pd.read_csv(StringIO(data))

In [None]:
# Read from specific columns

df = pd.read_csv(StringIO(data), usecols=lambda x: x.upper() in ['COL1','COL3'])

In [None]:
df

Convert it into .csv file

In [None]:
df.to_csv('Example1.csv')

In [None]:
# Specifying column data type

data = ('a,b,c,d\n'
       '1,2,3,4\n'
       '5,6,7,8\n'
       '9,10,11,12')

In [None]:
print(data)

In [None]:
df = pd.read_csv(StringIO(data), dtype=object)

In [None]:
df

In [None]:
df.info()

In [None]:
df['b'][1]

In [None]:
df = pd.read_csv(StringIO(data), dtype = {'a':int, 'b':float, 'c': 'Int64'})

In [None]:
df

In [None]:
df.info()

In [None]:
df['b'][1]

In [None]:
df.dtypes

##### Index columns and training delimiters 

In [None]:
data = ('index,a,b,c\n'
       '1,apple,50,3\n'
       '2,banana,40,6')

In [None]:
df = pd.read_csv(StringIO(data), index_col=0)

In [None]:
df

In [None]:
data = ('a,b,c\n'
       '1,apple,50,3\n'
       '2,banana,40')

In [None]:
df = pd.read_csv(StringIO(data), index_col=0)

In [None]:
df

In [None]:
pd.read_csv(StringIO(data), index_col=False)

Combining index_cols and use_cols

In [None]:
pd.read_csv(StringIO(data), usecols=['a','c'], index_col=0)

### Quoting and Escape characters - Used for NLP 

In [None]:
data = 'a,b\n"hello \\"Bob\\", nice to see you!", 2'

In [None]:
pd.read_csv(StringIO(data), escapechar='\\')

### URL to CSV 

In [None]:
df = pd.read_csv('https://download.bls.gov/pub/time.series/cu/cu.item', sep='\t')

In [None]:
df.head()

### JSON to CSV 

In [None]:
data = '{"emp_name":["Bhavya"],"emp_dept":["IT dept"],"email":["joshibhavya2000@gmail.com"],"job_title":["data scientist"]}'

In [None]:
pd.read_json(data)

In [None]:
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)

In [None]:
df.head()

In [None]:
#convert json to csv
df.to_csv('wine.csv')

In [None]:
#convert json to different json formats

df.to_json(orient='index')

In [None]:
df.to_json(orient='records')

### Reading HTML Content

In [None]:
url_mcc = 'https://en.wikipedia.org/wiki/Mobile_country_code'
dfs = pd.read_html(url_mcc, match='Country', header=0)


In [None]:
dfs[0]

### Reading Excel files

In [None]:
#df_excel=pd.read_excel('Excel_Sample.xlsx')
#df_excel.head()

### Pickling 

All pandas objects are equipped with to_pickle methods which use Python’s Pickle module to save data structures to disk using the pickle format.

In [None]:
#df_excel.to_pickle('df_excel')
#df=pd.read_pickle('df_excel')