## Pandas 📊
### Definition
Pandas is a powerful Python library used for data manipulation and analysis. It provides two main data structures:

- Series → 1D labeled array

- DataFrame → 2D labeled table of rows and columns

### Why Pandas?

Easy to handle large datasets

Supports multiple file formats (CSV, Excel, SQL, JSON, etc.)

Built-in methods for filtering, grouping, merging, reshaping data

In [1]:
#Import pandas
import pandas as pd


## Creating a Series

In [2]:
import pandas as pd

# From a list
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)


a    10
b    20
c    30
d    40
dtype: int64


## Creating a DataFrame

In [3]:
# From dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
print(df)


      Name  Age       City
0    Alice   25      Delhi
1      Bob   30     Mumbai
2  Charlie   35  Bangalore


## Reading & Writing Files

In [4]:
# Reading CSV file
df = pd.read_csv("D:\DATASETS-BI\Details.csv")
print(df)

# Writing CSV file
df.to_csv('output.csv', index=False)


     Order ID  Amount  Profit  Quantity     Category      Sub-Category  \
0     B-25681    1096     658         7  Electronics  Electronic Games   
1     B-26055    5729      64        14    Furniture            Chairs   
2     B-25955    2927     146         8    Furniture         Bookcases   
3     B-26093    2847     712         8  Electronics          Printers   
4     B-25602    2617    1151         4  Electronics            Phones   
...       ...     ...     ...       ...          ...               ...   
1495  B-25700       7      -3         2     Clothing       Hankerchief   
1496  B-25757    3151     -35         7     Clothing          Trousers   
1497  B-25973    4141    1698        13  Electronics          Printers   
1498  B-25698       7      -2         1     Clothing       Hankerchief   
1499  B-25993    4363     305         5    Furniture            Tables   

      PaymentMode  
0             COD  
1             EMI  
2             EMI  
3     Credit Card  
4     Credi

## Inbuilt pandas Function


| Function        | Description              |
| --------------- | ------------------------ |
| `head()`        | Show first rows          |
| `tail()`        | Show last rows           |
| `info()`        | Summary of DataFrame     |
| `describe()`    | Statistics summary       |
| `shape`         | Number of rows & columns |
| `drop()`        | Remove rows/columns      |
| `sort_values()` | Sort data                |
| `groupby()`     | Group data by a column   |


In [5]:
# Display first rows
print(df.head())

# Display column
print(df['PaymentMode'])

  Order ID  Amount  Profit  Quantity     Category      Sub-Category  \
0  B-25681    1096     658         7  Electronics  Electronic Games   
1  B-26055    5729      64        14    Furniture            Chairs   
2  B-25955    2927     146         8    Furniture         Bookcases   
3  B-26093    2847     712         8  Electronics          Printers   
4  B-25602    2617    1151         4  Electronics            Phones   

   PaymentMode  
0          COD  
1          EMI  
2          EMI  
3  Credit Card  
4  Credit Card  
0               COD
1               EMI
2               EMI
3       Credit Card
4       Credit Card
           ...     
1495            COD
1496            EMI
1497            COD
1498            COD
1499            EMI
Name: PaymentMode, Length: 1500, dtype: object


In [6]:
# Filter data
print(df[df['Amount'] > 2000])


     Order ID  Amount  Profit  Quantity     Category      Sub-Category  \
1     B-26055    5729      64        14    Furniture            Chairs   
2     B-25955    2927     146         8    Furniture         Bookcases   
3     B-26093    2847     712         8  Electronics          Printers   
4     B-25602    2617    1151         4  Electronics            Phones   
5     B-25881    2244     247         4     Clothing          Trousers   
11    B-25887    2125    -234         6  Electronics          Printers   
12    B-25923    3873    -891         6  Electronics            Phones   
14    B-25761    2188    1050         5    Furniture         Bookcases   
18    B-25853    2093     721         5    Furniture            Chairs   
1457  B-26099    2366     552         5     Clothing          Trousers   
1480  B-25862    2061     701         5    Furniture         Bookcases   
1482  B-25823    2103     322         8  Electronics  Electronic Games   
1483  B-25881    2115      23         

In [7]:
print(df.info())            # Structure of data
print(df.describe())        # Stats summary
print(df.shape)             # Dimensions
print(df.sort_values(by='Amount'))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Order ID      1500 non-null   object
 1   Amount        1500 non-null   int64 
 2   Profit        1500 non-null   int64 
 3   Quantity      1500 non-null   int64 
 4   Category      1500 non-null   object
 5   Sub-Category  1500 non-null   object
 6   PaymentMode   1500 non-null   object
dtypes: int64(3), object(4)
memory usage: 82.2+ KB
None
            Amount      Profit     Quantity
count  1500.000000  1500.00000  1500.000000
mean    291.847333    24.64200     3.743333
std     461.924620   168.55881     2.184942
min       4.000000 -1981.00000     1.000000
25%      47.750000   -12.00000     2.000000
50%     122.000000     8.00000     3.000000
75%     326.250000    38.00000     5.000000
max    5729.000000  1864.00000    14.000000
(1500, 7)
     Order ID  Amount  Profit  Quantity     Category S

## Start Initially

In [8]:
## First step is to import pandas

import pandas as pd
import numpy as np

In [9]:
## Dataframe

df=pd.DataFrame(np.arange(0,20).reshape(5,4),index=['Row1','Row2','Row3','Row4','Row5'],columns=["Column1","Column2","Column3","Coumn4"])
#(data,index,columns,dtype)

In [10]:
df.head()


Unnamed: 0,Column1,Column2,Column3,Coumn4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [11]:
df.to_csv('Test.csv')

In [12]:
## Accessing the elements 2 ways
# 1-> loc 2-> iloc(index location)

df.loc['Row1']

Column1    0
Column2    1
Column3    2
Coumn4     3
Name: Row1, dtype: int64

In [13]:
## Check the type

type(df.loc['Row1'])

pandas.core.series.Series

In [14]:
## Take the elements from the Column2
df.iloc[:,1:]

Unnamed: 0,Column2,Column3,Coumn4
Row1,1,2,3
Row2,5,6,7
Row3,9,10,11
Row4,13,14,15
Row5,17,18,19


In [15]:
type(df.iloc[:,1:])

pandas.core.frame.DataFrame

In [16]:
#convert Dataframes into array
df.iloc[:,1:].values

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11],
       [13, 14, 15],
       [17, 18, 19]])

In [17]:
df.iloc[:,1:].values.shape

(5, 3)

In [18]:
df['Column1'].value_counts()

Column1
0     1
4     1
8     1
12    1
16    1
Name: count, dtype: int64

In [19]:
df.isnull().sum()

Column1    0
Column2    0
Column3    0
Coumn4     0
dtype: int64

## Reading CSV files with various parameter

In [20]:
#import kagglehub
import sys
print(sys.executable)


c:\ProgramData\anaconda3\python.exe


In [21]:
%pip install kagglehub


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [22]:
import kagglehub


In [23]:
# Download latest version of dataset
path = kagglehub.dataset_download("ensaran/mercedesbenz")
print("Dataset downloaded to:", path)


Downloading from https://www.kaggle.com/api/v1/datasets/download/ensaran/mercedesbenz?dataset_version_number=1...


100%|██████████| 564k/564k [00:01<00:00, 363kB/s]

Extracting files...
Dataset downloaded to: C:\Users\Satyajit\.cache\kagglehub\datasets\ensaran\mercedesbenz\versions\1





### Step 1 — Find what’s inside

In [24]:
import os

path = r"C:\Users\Satyajit\.cache\kagglehub\datasets\ensaran\mercedesbenz\versions\1"
print(os.listdir(path))


['test.csv', 'train.csv']


### Step 2 — Load CSV files (most Kaggle datasets are CSVs)
- Example if there’s a train.csv file:

In [27]:
import pandas as pd

train_path = os.path.join(path, "train.csv")
df = pd.read_csv(train_path)

print(df.head())  # Show first 5 rows


   ID       y  X0 X1  X2 X3 X4 X5 X6 X8  ...  X375  X376  X377  X378  X379  \
0   0  130.81   k  v  at  a  d  u  j  o  ...     0     0     1     0     0   
1   6   88.53   k  t  av  e  d  y  l  o  ...     1     0     0     0     0   
2   7   76.26  az  w   n  c  d  x  j  x  ...     0     0     0     0     0   
3   9   80.62  az  t   n  f  d  x  l  e  ...     0     0     0     0     0   
4  13   78.02  az  v   n  f  d  h  d  n  ...     0     0     0     0     0   

   X380  X382  X383  X384  X385  
0     0     0     0     0     0  
1     0     0     0     0     0  
2     0     1     0     0     0  
3     0     0     0     0     0  
4     0     0     0     0     0  

[5 rows x 378 columns]


### Step 3 — If there are multiple files

In [28]:
train_df = pd.read_csv(os.path.join(path, "train.csv"))
test_df = pd.read_csv(os.path.join(path, "test.csv"))

print("Train shape:", train_df.shape)
print("Test shape:", test_df.shape)


Train shape: (4209, 378)
Test shape: (4209, 377)


In [31]:
train_df.head(5)

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,...,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,...,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,...,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,...,0,0,0,0,0,0,0,0,0,0


In [32]:
test_df.head(5)

Unnamed: 0,ID,X0,X1,X2,X3,X4,X5,X6,X8,X10,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,1,az,v,n,f,d,t,a,w,0,...,0,0,0,1,0,0,0,0,0,0
1,2,t,b,ai,a,d,b,g,y,0,...,0,0,1,0,0,0,0,0,0,0
2,3,az,v,as,f,d,a,j,j,0,...,0,0,0,1,0,0,0,0,0,0
3,4,az,l,n,f,d,z,l,n,0,...,0,0,0,1,0,0,0,0,0,0
4,5,w,s,as,c,d,y,i,m,0,...,1,0,0,0,0,0,0,0,0,0


In [33]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4209 entries, 0 to 4208
Columns: 378 entries, ID to X385
dtypes: float64(1), int64(369), object(8)
memory usage: 12.1+ MB


In [34]:
train_df.describe()

Unnamed: 0,ID,y,X10,X11,X12,X13,X14,X15,X16,X17,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
count,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,...,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0
mean,4205.960798,100.669318,0.013305,0.0,0.075077,0.057971,0.42813,0.000475,0.002613,0.007603,...,0.318841,0.057258,0.314802,0.02067,0.009503,0.008078,0.007603,0.001663,0.000475,0.001426
std,2437.608688,12.679381,0.11459,0.0,0.263547,0.233716,0.494867,0.021796,0.051061,0.086872,...,0.466082,0.232363,0.464492,0.142294,0.097033,0.089524,0.086872,0.040752,0.021796,0.037734
min,0.0,72.11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2095.0,90.82,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,4220.0,99.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,6314.0,109.01,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,8417.0,265.32,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [35]:
# Get the unique category counts
train_df['X0'].value_counts()

X0
z     360
ak    349
y     324
ay    313
t     306
x     300
o     269
f     227
n     195
w     182
j     181
az    175
aj    151
s     106
ap    103
h      75
d      73
al     67
v      36
af     35
ai     34
m      34
e      32
ba     27
at     25
a      21
ax     19
aq     18
i      18
am     18
u      17
aw     16
l      16
ad     14
k      11
au     11
b      11
r      10
as     10
bc      6
ao      4
c       3
q       2
aa      2
ac      1
g       1
ab      1
Name: count, dtype: int64

In [36]:
train_df[train_df['y']>100]

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
6,24,128.76,al,r,e,f,d,f,h,s,...,0,0,0,0,0,0,0,0,0,0
8,27,108.67,w,s,as,e,d,f,i,h,...,1,0,0,0,0,0,0,0,0,0
9,30,126.99,j,b,aq,c,d,f,a,e,...,0,0,1,0,0,0,0,0,0,0
10,31,102.09,h,r,r,f,d,f,h,p,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4202,8402,123.34,ap,l,s,c,d,aa,d,r,...,0,0,0,0,0,0,0,0,0,0
4204,8405,107.39,ak,s,as,c,d,aa,d,q,...,1,0,0,0,0,0,0,0,0,0
4205,8406,108.77,j,o,t,d,d,aa,h,h,...,0,1,0,0,0,0,0,0,0,0
4206,8412,109.22,ak,v,r,a,d,aa,g,e,...,0,0,1,0,0,0,0,0,0,0


In [54]:
import numpy as np

In [57]:
lst_data=[[1,2,3],[3,4,np.nan],[5,6,np.nan],[np.nan,np.nan,np.nan]]
df=pd.DataFrame(lst_data)
df.head()

Unnamed: 0,0,1,2
0,1.0,2.0,3.0
1,3.0,4.0,
2,5.0,6.0,
3,,,


In [None]:
## Handling Missing Values
## Drop nan values
df.dropna(axis=0)# in rows where Nan is present that removed

Unnamed: 0,0,1,2
0,1.0,2.0,3.0


In [63]:
df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f', 'h'],
                     columns=['one', 'two', 'three'])

In [64]:
df

Unnamed: 0,one,two,three
a,1.379161,-2.697307,-2.164108
c,-0.81128,0.968314,0.144188
e,-0.749323,0.502272,-1.03912
f,-1.265932,-0.351529,-1.243552
h,-0.434794,-1.449924,-0.553234


In [65]:
df2=df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

In [66]:
df2

Unnamed: 0,one,two,three
a,1.379161,-2.697307,-2.164108
b,,,
c,-0.81128,0.968314,0.144188
d,,,
e,-0.749323,0.502272,-1.03912
f,-1.265932,-0.351529,-1.243552
g,,,
h,-0.434794,-1.449924,-0.553234


In [67]:
df2.dropna(axis=0)


Unnamed: 0,one,two,three
a,1.379161,-2.697307,-2.164108
c,-0.81128,0.968314,0.144188
e,-0.749323,0.502272,-1.03912
f,-1.265932,-0.351529,-1.243552
h,-0.434794,-1.449924,-0.553234


In [68]:
pd.isna(df2['one'])


a    False
b     True
c    False
d     True
e    False
f    False
g     True
h    False
Name: one, dtype: bool

In [69]:
df2['one'].notna()


a     True
b    False
c     True
d    False
e     True
f     True
g    False
h     True
Name: one, dtype: bool

In [70]:
df2.fillna('Missing')


Unnamed: 0,one,two,three
a,1.379161,-2.697307,-2.164108
b,Missing,Missing,Missing
c,-0.81128,0.968314,0.144188
d,Missing,Missing,Missing
e,-0.749323,0.502272,-1.03912
f,-1.265932,-0.351529,-1.243552
g,Missing,Missing,Missing
h,-0.434794,-1.449924,-0.553234


In [71]:
df2['one'].values


array([ 1.37916117,         nan, -0.81128049,         nan, -0.74932317,
       -1.26593221,         nan, -0.4347945 ])

### ### Reading different data sources with the help of pandas

## CSV

In [15]:
import pandas as pd
import numpy as np
from io import StringIO, BytesIO

In [16]:
data=('col1, col2, col3\n'
      'x,y,1\n'
      'a,b,2\n'
      'c,d,3')

In [17]:
type(data)

str

In [18]:
pd.read_csv(StringIO(data))

Unnamed: 0,col1,col2,col3
0,x,y,1
1,a,b,2
2,c,d,3


In [23]:
# pd.read_csv(StringIO(data), usecols=lambda x:x.upper() in ['COL1','COL3'])
df=pd.read_csv(StringIO(data),usecols=['col1'])
df

Unnamed: 0,col1
0,x
1,a
2,c


In [24]:
df.to_csv('Test.csv')

In [43]:
data=('a, b, c, d\n'
      '1,2,3,4\n'
      '5,6,7,8\n'
      '9,10,11,12')

In [44]:
print(data)

a, b, c, d
1,2,3,4
5,6,7,8
9,10,11,12


In [45]:
df=pd.read_csv(StringIO(data),dtype=object)

In [46]:
df

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,5,6,7,8
2,9,10,11,12


In [47]:
df['a'][2]

'9'

In [51]:
df=pd.read_csv(StringIO(data), dtype={'b':int, 'c':np.float64, 'a':'Int64'})

In [52]:
df

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,5,6,7,8
2,9,10,11,12


In [53]:
df.dtypes

a     Int64
 b    int64
 c    int64
 d    int64
dtype: object

In [72]:
# Quoting and Escape Characters, very useful in NLP
data = 'a,b\n"hello, \\"Bob\\", nice to see you",5'

In [73]:
pd.read_csv(StringIO(data),escapechar='\\')


Unnamed: 0,a,b
0,"hello, ""Bob"", nice to see you",5


In [75]:
## Read Json to CSV
Data = '{"employee_name": "James", "email": "james@gmail.com", "job_profile": [{"title1":"Team Lead", "title2":"Sr. Developer"}]}'
pd.read_json(Data)

  pd.read_json(Data)


Unnamed: 0,employee_name,email,job_profile
0,James,james@gmail.com,"{'title1': 'Team Lead', 'title2': 'Sr. Develop..."


In [76]:
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)


In [77]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [79]:
# convert Json to csv
df.to_csv('wine.csv')

In [80]:
# convert Json to different json formats

df.to_json(orient="index")

'{"0":{"0":1,"1":14.23,"2":1.71,"3":2.43,"4":15.6,"5":127,"6":2.8,"7":3.06,"8":0.28,"9":2.29,"10":5.64,"11":1.04,"12":3.92,"13":1065},"1":{"0":1,"1":13.2,"2":1.78,"3":2.14,"4":11.2,"5":100,"6":2.65,"7":2.76,"8":0.26,"9":1.28,"10":4.38,"11":1.05,"12":3.4,"13":1050},"2":{"0":1,"1":13.16,"2":2.36,"3":2.67,"4":18.6,"5":101,"6":2.8,"7":3.24,"8":0.3,"9":2.81,"10":5.68,"11":1.03,"12":3.17,"13":1185},"3":{"0":1,"1":14.37,"2":1.95,"3":2.5,"4":16.8,"5":113,"6":3.85,"7":3.49,"8":0.24,"9":2.18,"10":7.8,"11":0.86,"12":3.45,"13":1480},"4":{"0":1,"1":13.24,"2":2.59,"3":2.87,"4":21.0,"5":118,"6":2.8,"7":2.69,"8":0.39,"9":1.82,"10":4.32,"11":1.04,"12":2.93,"13":735},"5":{"0":1,"1":14.2,"2":1.76,"3":2.45,"4":15.2,"5":112,"6":3.27,"7":3.39,"8":0.34,"9":1.97,"10":6.75,"11":1.05,"12":2.85,"13":1450},"6":{"0":1,"1":14.39,"2":1.87,"3":2.45,"4":14.6,"5":96,"6":2.5,"7":2.52,"8":0.3,"9":1.98,"10":5.25,"11":1.02,"12":3.58,"13":1290},"7":{"0":1,"1":14.06,"2":2.15,"3":2.61,"4":17.6,"5":121,"6":2.6,"7":2.51,"8":0.3

In [81]:
df.to_json(orient="records")

'[{"0":1,"1":14.23,"2":1.71,"3":2.43,"4":15.6,"5":127,"6":2.8,"7":3.06,"8":0.28,"9":2.29,"10":5.64,"11":1.04,"12":3.92,"13":1065},{"0":1,"1":13.2,"2":1.78,"3":2.14,"4":11.2,"5":100,"6":2.65,"7":2.76,"8":0.26,"9":1.28,"10":4.38,"11":1.05,"12":3.4,"13":1050},{"0":1,"1":13.16,"2":2.36,"3":2.67,"4":18.6,"5":101,"6":2.8,"7":3.24,"8":0.3,"9":2.81,"10":5.68,"11":1.03,"12":3.17,"13":1185},{"0":1,"1":14.37,"2":1.95,"3":2.5,"4":16.8,"5":113,"6":3.85,"7":3.49,"8":0.24,"9":2.18,"10":7.8,"11":0.86,"12":3.45,"13":1480},{"0":1,"1":13.24,"2":2.59,"3":2.87,"4":21.0,"5":118,"6":2.8,"7":2.69,"8":0.39,"9":1.82,"10":4.32,"11":1.04,"12":2.93,"13":735},{"0":1,"1":14.2,"2":1.76,"3":2.45,"4":15.2,"5":112,"6":3.27,"7":3.39,"8":0.34,"9":1.97,"10":6.75,"11":1.05,"12":2.85,"13":1450},{"0":1,"1":14.39,"2":1.87,"3":2.45,"4":14.6,"5":96,"6":2.5,"7":2.52,"8":0.3,"9":1.98,"10":5.25,"11":1.02,"12":3.58,"13":1290},{"0":1,"1":14.06,"2":2.15,"3":2.61,"4":17.6,"5":121,"6":2.6,"7":2.51,"8":0.31,"9":1.25,"10":5.05,"11":1.06,"

### Reading HTML Content

In [82]:
url = 'https://www.fdic.gov/bank/individual/failed/banklist.html'

dfs = pd.read_html(url)

In [83]:
dfs

[                                            Bank Name                City  \
 0                        The Santa Anna National Bank          Santa Anna   
 1                                Pulaski Savings Bank             Chicago   
 2                  The First National Bank of Lindsay             Lindsay   
 3               Republic First Bank dba Republic Bank        Philadelphia   
 4                                       Citizens Bank            Sac City   
 5                            Heartland Tri-State Bank             Elkhart   
 6                                 First Republic Bank       San Francisco   
 7                                      Signature Bank            New York   
 8                                 Silicon Valley Bank         Santa Clara   
 9                                   Almena State Bank              Almena   
 10                         First City Bank of Florida   Fort Walton Beach   
 11                               The First State Bank       Bar

In [84]:
url_mcc = 'https://en.wikipedia.org/wiki/Mobile_country_code'
dfs = pd.read_html(url_mcc, match='Country', header=0)

In [85]:
dfs

[     Mobile country code                                    Country ISO 3166  \
 0                    289                                 A Abkhazia    GE-AB   
 1                    412                                Afghanistan       AF   
 2                    276                                    Albania       AL   
 3                    603                                    Algeria       DZ   
 4                    544  American Samoa (United States of America)       AS   
 ..                   ...                                        ...      ...   
 247                  452                                    Vietnam       VN   
 248                  543                        W Wallis and Futuna       WF   
 249                  421                                    Y Yemen       YE   
 250                  645                                   Z Zambia       ZM   
 251                  648                                   Zimbabwe       ZW   
 
                          

### Reading Excel files

In [None]:
df_excel=pd.read_excel('Excel_Sample.xlsx')


## Pickling
- All pandas objects are equipped with to_pickle methods which use Python’s cPickle module to save data structures to disk using the pickle format.

In [None]:
df_excel.to_pickle('df_excel')


In [None]:
df=pd.read_pickle('df_excel')
df.head()
