### Pandas-DataFrame and Series 

Pandas is a powerful data manipulation library in python, widely used for data analysis and data cleaning. It provides two primary data structures. Series and DataFrame. A Serie is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially hetergenous tabular data structure with labeled axes (rows and columns).

In [4]:
import pandas as pd

In [5]:
## Series 
# A panda series is a one-dimensional array-like object that can hold any data type. It is similar to a column in a table.

data=[1,2,3,4,5,6]
series=pd.Series(data)
print("Series \n ", series)

Series 
  0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64


In [6]:
## Create a Series from a dictionary

print(type(series))

<class 'pandas.core.series.Series'>


In [7]:
data={'a':1,'b':2,'c':3}
series_dict=pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [10]:
data=[10,20,30]
index=['a','b','c']

series_index = pd.Series(data,index=index)
print(series_index)

a    10
b    20
c    30
dtype: int64


In [11]:
## Dataframe

## create dataframe from a dictionary of list 

data = {
    'Name': ['Krish','Jhon','Jack'],
    'Age': [25,30,45],
    'number': [1,5,4],
    'zip_code':[4,7,8]
}

df = pd.DataFrame(data)
print(df)

    Name  Age  number  zip_code
0  Krish   25       1         4
1   Jhon   30       5         7
2   Jack   45       4         8


In [12]:
import numpy as np

arr= np.array(df)
print(arr)

[['Krish' 25 1 4]
 ['Jhon' 30 5 7]
 ['Jack' 45 4 8]]


In [17]:
## Create a dataframe from a list of dictionaries

data = [
    {'Name':'Krish','Age':25,'Number':1,'zip_code':4},
    {'Name': 'Jhon','Age':30,'Number':5,'zip_code':7},
    {'Name': 'Jack','Age':45,'Number':4,'zip_code':8}
]

df = pd.DataFrame(data)
print(df)
print(type(df))

    Name  Age  Number  zip_code
0  Krish   25       1         4
1   Jhon   30       5         7
2   Jack   45       4         8
<class 'pandas.core.frame.DataFrame'>


In [26]:
df = pd.read_csv('example.csv')

df.head(5)
df.tail(5)

Unnamed: 0,Name,Age,Number,Zip_code
0,Hola,12,89,45
1,Faaa,87,9,6
2,Alo,7,5,8
3,Acaaa,8,6,8


In [27]:
 ## Acessing Data from DataFrame
df

Unnamed: 0,Name,Age,Number,Zip_code
0,Hola,12,89,45
1,Faaa,87,9,6
2,Alo,7,5,8
3,Acaaa,8,6,8


In [28]:
df['Name']

0     Hola
1     Faaa
2      Alo
3    Acaaa
Name: Name, dtype: object

In [29]:
for i in df['Name']:
    print(i)

Hola
Faaa
Alo
Acaaa


In [31]:
df.loc[0][0]

  df.loc[0][0]


'Hola'

In [35]:
print(df.iloc[0][2])

89


  print(df.iloc[0][2])


In [41]:
#Accessing a especified element

df.at[1,'Name']

'Faaa'

In [43]:
## Accessing a specified elements using iat

df.iat[2,2]

np.int64(5)

In [44]:
df

Unnamed: 0,Name,Age,Number,Zip_code
0,Hola,12,89,45
1,Faaa,87,9,6
2,Alo,7,5,8
3,Acaaa,8,6,8


In [45]:
## Data handling with Dataframe
## Adding a column##
df['Number']=[1000,5000,6000,4000]
df

Unnamed: 0,Name,Age,Number,Zip_code
0,Hola,12,1000,45
1,Faaa,87,5000,6
2,Alo,7,6000,8
3,Acaaa,8,4000,8


In [48]:
## Remove a column 
df.drop('Number',axis=1,inplace=True)

In [49]:
df

Unnamed: 0,Name,Age,Zip_code
0,Hola,12,45
1,Faaa,87,6
2,Alo,7,8
3,Acaaa,8,8


In [50]:
df['Number'] = df['Number'] + 1

KeyError: 'Number'

In [51]:
df

Unnamed: 0,Name,Age,Zip_code
0,Hola,12,45
1,Faaa,87,6
2,Alo,7,8
3,Acaaa,8,8


In [52]:
df['Age'] = df['Age'] + 1

In [53]:
df

Unnamed: 0,Name,Age,Zip_code
0,Hola,13,45
1,Faaa,88,6
2,Alo,8,8
3,Acaaa,9,8


In [54]:
df.drop(0)

Unnamed: 0,Name,Age,Zip_code
1,Faaa,88,6
2,Alo,8,8
3,Acaaa,9,8


In [55]:
df

Unnamed: 0,Name,Age,Zip_code
0,Hola,13,45
1,Faaa,88,6
2,Alo,8,8
3,Acaaa,9,8


In [56]:
df.drop(0,inplace=True)

In [57]:
df

Unnamed: 0,Name,Age,Zip_code
1,Faaa,88,6
2,Alo,8,8
3,Acaaa,9,8


In [58]:
df = pd.read_csv('example.csv')
df.head(5)

Unnamed: 0,Name,Age,Number,Zip_code
0,Hola,12,89,45
1,Faaa,87,9,6
2,Alo,7,5,8
3,Acaaa,8,6,8


In [59]:
print("Data types:\n",df.dtypes)
print('Statistical summary:\n',df.describe())
grouped= df.groupby('Zip_code')['Age'].mean()
print('groupped:',grouped)


Data types:
 Name        object
Age          int64
Number       int64
Zip_code     int64
dtype: object
Statistical summary:
              Age     Number   Zip_code
count   4.000000   4.000000   4.000000
mean   28.500000  27.250000  16.750000
std    39.059783  41.201739  18.856917
min     7.000000   5.000000   6.000000
25%     7.750000   5.750000   7.500000
50%    10.000000   7.500000   8.000000
75%    30.750000  29.000000  17.250000
max    87.000000  89.000000  45.000000
groupped: Zip_code
6     87.0
8      7.5
45    12.0
Name: Age, dtype: float64


In [60]:
df.describe()

Unnamed: 0,Age,Number,Zip_code
count,4.0,4.0,4.0
mean,28.5,27.25,16.75
std,39.059783,41.201739,18.856917
min,7.0,5.0,6.0
25%,7.75,5.75,7.5
50%,10.0,7.5,8.0
75%,30.75,29.0,17.25
max,87.0,89.0,45.0
