## Pandas - DataFrame and Series

Pandas is powerful data manipulation library in Python, widely used for **Data Analysis and Data Cleaning**.

It provides two primary data structures: **Series and Data Frames**. 

A **Series** is a one dimensional array like object, while **Data Frames** is a two dimensional, Size Mutable, and potenstially heterogeneous tabular data structure with labeled axes (rows and columns)

In [28]:
import pandas as pd

In [29]:
series =pd.Series([1,2,3,4,5])
print(series)

0    1
1    2
2    3
3    4
4    5
dtype: int64


In [30]:
print(type(series))

<class 'pandas.core.series.Series'>


In [31]:
# Create a Series from Dictionary
data = {
    'a':1,
    'b':2,
    'c':3
}

seriesDictionary  = pd.Series(data)

In [32]:
print(seriesDictionary)

a    1
b    2
c    3
dtype: int64


In [33]:
data = [10,20,30]
index = ['a','b','c']

pd.Series(data, index= index)

a    10
b    20
c    30
dtype: int64

In [34]:
# Data Frame

# Creates a DataFrame from a dictionary of list
data = {
    'Name': ["Shivam", "Varun", "Tanuj", "Sumit", "Sameer"],
    'Age' : [19, 18, 21, 19, 19],
    'City': ["Rohtak", "Jind", "Sonipat", "Panipat", "Ambala"]
}

In [35]:
df = pd.DataFrame(data)
print(df)
print(type(df))

     Name  Age     City
0  Shivam   19   Rohtak
1   Varun   18     Jind
2   Tanuj   21  Sonipat
3   Sumit   19  Panipat
4  Sameer   19   Ambala
<class 'pandas.core.frame.DataFrame'>


In [36]:
import numpy as np

np.array(df)

array([['Shivam', 19, 'Rohtak'],
       ['Varun', 18, 'Jind'],
       ['Tanuj', 21, 'Sonipat'],
       ['Sumit', 19, 'Panipat'],
       ['Sameer', 19, 'Ambala']], dtype=object)

In [37]:
data = pd.read_csv("Details.csv")

In [38]:
data.head(5)

Unnamed: 0,Order ID,Amount,Profit,Quantity,Category,Sub-Category,PaymentMode
0,B-25681,1096,658,7,Electronics,Electronic Games,COD
1,B-26055,5729,64,14,Furniture,Chairs,EMI
2,B-25955,2927,146,8,Furniture,Bookcases,EMI
3,B-26093,2847,712,8,Electronics,Printers,Credit Card
4,B-25602,2617,1151,4,Electronics,Phones,Credit Card


In [39]:
data.tail(5)

Unnamed: 0,Order ID,Amount,Profit,Quantity,Category,Sub-Category,PaymentMode
1495,B-25700,7,-3,2,Clothing,Hankerchief,COD
1496,B-25757,3151,-35,7,Clothing,Trousers,EMI
1497,B-25973,4141,1698,13,Electronics,Printers,COD
1498,B-25698,7,-2,1,Clothing,Hankerchief,COD
1499,B-25993,4363,305,5,Furniture,Tables,EMI


In [40]:
# Accessing Data from Database
df

Unnamed: 0,Name,Age,City
0,Shivam,19,Rohtak
1,Varun,18,Jind
2,Tanuj,21,Sonipat
3,Sumit,19,Panipat
4,Sameer,19,Ambala


In [41]:
df["Name"]

0    Shivam
1     Varun
2     Tanuj
3     Sumit
4    Sameer
Name: Name, dtype: object

In [42]:
print(type(df["Name"]))

<class 'pandas.core.series.Series'>


In [43]:
df.loc[0]           # Zeroth (Oth) Row

Name    Shivam
Age         19
City    Rohtak
Name: 0, dtype: object

In [44]:
df.loc[3]

Name      Sumit
Age          19
City    Panipat
Name: 3, dtype: object

In [45]:
df.iloc[0][0]

  df.iloc[0][0]


'Shivam'

In [46]:
df

Unnamed: 0,Name,Age,City
0,Shivam,19,Rohtak
1,Varun,18,Jind
2,Tanuj,21,Sonipat
3,Sumit,19,Panipat
4,Sameer,19,Ambala


In [47]:
# Accessing a Specified Element
df.at[1,"Age"]

np.int64(18)

In [48]:
# Accessing a specified element using iat
df.iat[2,2]

'Sonipat'

In [49]:
df

Unnamed: 0,Name,Age,City
0,Shivam,19,Rohtak
1,Varun,18,Jind
2,Tanuj,21,Sonipat
3,Sumit,19,Panipat
4,Sameer,19,Ambala


In [None]:
# Data Manipulation with Data Frames

# Adding a Column
df['Salary'] = [50000,60000,70000,80000,90000]
df

Unnamed: 0,Name,Age,City,Salary
0,Shivam,19,Rohtak,50000
1,Varun,18,Jind,60000
2,Tanuj,21,Sonipat,70000
3,Sumit,19,Panipat,80000
4,Sameer,19,Ambala,90000


In [None]:
# Remove a Column - It is not a Permanent Operation
df.drop('Salary', axis = 1)

Unnamed: 0,Name,Age,City
0,Shivam,19,Rohtak
1,Varun,18,Jind
2,Tanuj,21,Sonipat
3,Sumit,19,Panipat
4,Sameer,19,Ambala


In [54]:
df

Unnamed: 0,Name,Age,City,Salary
0,Shivam,19,Rohtak,50000
1,Varun,18,Jind,60000
2,Tanuj,21,Sonipat,70000
3,Sumit,19,Panipat,80000
4,Sameer,19,Ambala,90000


In [None]:
# Remove a Column - It is a Permanent Operation
df.drop('Salary', axis = 1, inplace=True)

In [56]:
df

Unnamed: 0,Name,Age,City
0,Shivam,19,Rohtak
1,Varun,18,Jind
2,Tanuj,21,Sonipat
3,Sumit,19,Panipat
4,Sameer,19,Ambala


In [57]:
# Add age to the Column
df["Age"] = df["Age"]+1

In [58]:
df

Unnamed: 0,Name,Age,City
0,Shivam,20,Rohtak
1,Varun,19,Jind
2,Tanuj,22,Sonipat
3,Sumit,20,Panipat
4,Sameer,20,Ambala


In [59]:
data = pd.read_csv("Details.csv")

In [60]:
data.head(5)

Unnamed: 0,Order ID,Amount,Profit,Quantity,Category,Sub-Category,PaymentMode
0,B-25681,1096,658,7,Electronics,Electronic Games,COD
1,B-26055,5729,64,14,Furniture,Chairs,EMI
2,B-25955,2927,146,8,Furniture,Bookcases,EMI
3,B-26093,2847,712,8,Electronics,Printers,Credit Card
4,B-25602,2617,1151,4,Electronics,Phones,Credit Card


In [62]:
# Display the data types of each column
print("Data Type: \n", data.dtypes)

Data Type: 
 Order ID        object
Amount           int64
Profit           int64
Quantity         int64
Category        object
Sub-Category    object
PaymentMode     object
dtype: object


In [64]:
# Describe the Data Frame
print("Statical Summary:\n", data.describe())

Statical Summary:
             Amount      Profit     Quantity
count  1500.000000  1500.00000  1500.000000
mean    291.847333    24.64200     3.743333
std     461.924620   168.55881     2.184942
min       4.000000 -1981.00000     1.000000
25%      47.750000   -12.00000     2.000000
50%     122.000000     8.00000     3.000000
75%     326.250000    38.00000     5.000000
max    5729.000000  1864.00000    14.000000


In [66]:
data.describe()

Unnamed: 0,Amount,Profit,Quantity
count,1500.0,1500.0,1500.0
mean,291.847333,24.642,3.743333
std,461.92462,168.55881,2.184942
min,4.0,-1981.0,1.0
25%,47.75,-12.0,2.0
50%,122.0,8.0,3.0
75%,326.25,38.0,5.0
max,5729.0,1864.0,14.0
