# **Pandas with Python: For beginners**

Pandas is a Python library used for working with data sets. It has functions for analyzing, cleaning, exploring, and manipulating data.

- Pandas allows us to analyze big data and make conclusions based on statistical theories.
- Pandas can clean messy data sets, and make them readable and relevant.

In [2]:
import pandas as pd

In [11]:
# Pandas Dataframe 

fruits_df = {'fruits': ["Mango", "Apple", "Coconut"],'Price': [30, 70, 200]}
df = pd.DataFrame(fruits_df)
print(df)

    fruits  Price
0    Mango     30
1    Apple     70
2  Coconut    200


In [12]:
# Pandas Series

a = [1, 7, 2]
df = pd.Series(a, index = ["x", "y", "z"])
print(df)

x    1
y    7
z    2
dtype: int64


In [13]:
# Key/Value Objects as Series

calories = {"day1": 420, "day2": 380, "day3": 390}
df = pd.Series(calories)
print(df)

day1    420
day2    380
day3    390
dtype: int64


In [19]:
# Load Files Into a DataFrame

df = pd.read_csv('Live.csv')
df

Unnamed: 0,status_id,status_type,status_published,num_reactions,num_comments,num_shares,num_likes,num_loves,num_wows,num_hahas,num_sads,num_angrys,Column1,Column2,Column3,Column4
0,246675545449582_1649696485147474,video,4/22/2018 6:00,529,512,262,432,92,3,1,1,0,,,,
1,246675545449582_1649426988507757,photo,4/21/2018 22:45,150,0,0,150,0,0,0,0,0,,,,
2,246675545449582_1648730588577397,video,4/21/2018 6:17,227,236,57,204,21,1,1,0,0,,,,
3,246675545449582_1648576705259452,photo,4/21/2018 2:29,111,0,0,111,0,0,0,0,0,,,,
4,246675545449582_1645700502213739,photo,4/18/2018 3:22,213,0,0,204,9,0,0,0,0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7045,1050855161656896_1061863470556065,photo,9/24/2016 2:58,89,0,0,89,0,0,0,0,0,,,,
7046,1050855161656896_1061334757275603,photo,9/23/2016 11:19,16,0,0,14,1,0,1,0,0,,,,
7047,1050855161656896_1060126464063099,photo,9/21/2016 23:03,2,0,0,1,1,0,0,0,0,,,,
7048,1050855161656896_1058663487542730,photo,9/20/2016 0:43,351,12,22,349,2,0,0,0,0,,,,


In [20]:
# Analyszing the data
df.head()

Unnamed: 0,status_id,status_type,status_published,num_reactions,num_comments,num_shares,num_likes,num_loves,num_wows,num_hahas,num_sads,num_angrys,Column1,Column2,Column3,Column4
0,246675545449582_1649696485147474,video,4/22/2018 6:00,529,512,262,432,92,3,1,1,0,,,,
1,246675545449582_1649426988507757,photo,4/21/2018 22:45,150,0,0,150,0,0,0,0,0,,,,
2,246675545449582_1648730588577397,video,4/21/2018 6:17,227,236,57,204,21,1,1,0,0,,,,
3,246675545449582_1648576705259452,photo,4/21/2018 2:29,111,0,0,111,0,0,0,0,0,,,,
4,246675545449582_1645700502213739,photo,4/18/2018 3:22,213,0,0,204,9,0,0,0,0,,,,


In [21]:
df.describe()
# describe method does not works with categorical values

Unnamed: 0,num_reactions,num_comments,num_shares,num_likes,num_loves,num_wows,num_hahas,num_sads,num_angrys,Column1,Column2,Column3,Column4
count,7050.0,7050.0,7050.0,7050.0,7050.0,7050.0,7050.0,7050.0,7050.0,0.0,0.0,0.0,0.0
mean,230.117163,224.356028,40.022553,215.043121,12.728652,1.289362,0.696454,0.243688,0.113191,,,,
std,462.625309,889.63682,131.599965,449.472357,39.97293,8.71965,3.957183,1.597156,0.726812,,,,
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
25%,17.0,0.0,0.0,17.0,0.0,0.0,0.0,0.0,0.0,,,,
50%,59.5,4.0,0.0,58.0,0.0,0.0,0.0,0.0,0.0,,,,
75%,219.0,23.0,4.0,184.75,3.0,0.0,0.0,0.0,0.0,,,,
max,4710.0,20990.0,3424.0,4710.0,657.0,278.0,157.0,51.0,31.0,,,,


### Data Cleaning

Bad data could be:
- Empty cells
- Data in wrong format
- Wrong data
- Duplicates

In [25]:
df.isnull().sum()
# there are 4 columns with 7050 null values. We will drop them

status_id              0
status_type            0
status_published       0
num_reactions          0
num_comments           0
num_shares             0
num_likes              0
num_loves              0
num_wows               0
num_hahas              0
num_sads               0
num_angrys             0
Column1             7050
Column2             7050
Column3             7050
Column4             7050
dtype: int64

In [None]:
df.drop(columns=[])

In [26]:
new_df = df.dropna()

In [27]:
new_df.isnull().sum()

status_id           0.0
status_type         0.0
status_published    0.0
num_reactions       0.0
num_comments        0.0
num_shares          0.0
num_likes           0.0
num_loves           0.0
num_wows            0.0
num_hahas           0.0
num_sads            0.0
num_angrys          0.0
Column1             0.0
Column2             0.0
Column3             0.0
Column4             0.0
dtype: float64

In [22]:
# Pandas Trick

df1  = pd.DataFrame({"col1" : [1,2,3,4] , "col2" : list("abcd")})
df2  = pd.DataFrame({"col1" : [5,6,7,8] , "col2" : list("pqrs")})

df1

Unnamed: 0,col1,col2
0,1,a
1,2,b
2,3,c
3,4,d


In [23]:
df2

Unnamed: 0,col1,col2
0,5,p
1,6,q
2,7,r
3,8,s


In [24]:
display(df1,df2)

Unnamed: 0,col1,col2
0,1,a
1,2,b
2,3,c
3,4,d


Unnamed: 0,col1,col2
0,5,p
1,6,q
2,7,r
3,8,s
