### Pandas - DataFrame and Series

Pandas is a powerful data manipulation library in Python

Pandas is widely used for data analysis and data cleaning

It provides two data structures - 

    1. Series:

        A Series is a one-dimensional array like object

        We can convert 1D array in numpy to Series in pandas and vice-versa

    2. DataFrame:

        DataFrame is a two-dimensional

        Size-mutable

        Potentially heterogenous tabular data structure with labeled axis (rows and columns)

        We can convert a 2D array in numpy to DataFrame in pandas and vice-versa



In [1]:
## to install pandas library
! pip install pandas



In [2]:
import pandas as pd

#### Series

1. Series is a one-dimensional array like object that can hold any type of data

2. It is similar to a column in a table

In [7]:
## Creating a series from a list

data = [1,2,3,4,5]
series = pd.Series(data)
print("Series: \n", series)
print(type(series))

Series: 
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [8]:
## Creating a series from a dictionary

data = {'a':1, 'b':2, 'c':3, 'd':4}
series = pd.Series(data)
print(series)
print(type(series))


a    1
b    2
c    3
d    4
dtype: int64
<class 'pandas.core.series.Series'>


In [9]:
## Another way of creating series

data = [1,2,3,4,5]
index = ['a', 'b', 'c', 'd', 'e']
pd.Series(data, index=index)


a    1
b    2
c    3
d    4
e    5
dtype: int64

In [18]:
## DataFrame

## Creating a dataframe using dictionary of lists

data = {
    'Name': ['Ram', 'Rahul', 'Sai'],
    'Age': [20, 23, 26],
    'City': ['Hyderabad', 'Bangalore', 'Coimbatore']
}

df = pd.DataFrame(data)
print(df)
print(type(df))

    Name  Age        City
0    Ram   20   Hyderabad
1  Rahul   23   Bangalore
2    Sai   26  Coimbatore
<class 'pandas.core.frame.DataFrame'>


In [15]:
## To convert a dataframe to a 2D array in numpy

import numpy as np

arr = np.array(df)
print(arr)
print(type(arr))

## Index and column names were neglected

[['Ram' 20 'Hyderabad']
 ['Rahul' 23 'Bangalore']
 ['Sai' 26 'Coimbatore']]
<class 'numpy.ndarray'>


In [14]:
## To convert 2D array in numpy to dataframe

df2 = pd.DataFrame(arr)
print(df2)
print(type(df2))

## Note: Index and attribute values will be 0 to n numbers

       0   1           2
0    Ram  20   Hyderabad
1  Rahul  23   Bangalore
2    Sai  26  Coimbatore
<class 'pandas.core.frame.DataFrame'>


In [35]:
## Creating a DataFrame from a list of dictionaries

data = [
    {'Name':'Ram', 'Age':32, 'City': 'Texas'},
    {'Name':'Seetha', 'Age':25, 'City': 'New York'},
    {'Name':'Raghu', 'Age':33, 'City': 'California'},
    {'Name':'Arjun', 'Age':28, 'City': 'Florida'},
]

df = pd.DataFrame(data)
print(df)
print(type(df))

     Name  Age        City
0     Ram   32       Texas
1  Seetha   25    New York
2   Raghu   33  California
3   Arjun   28     Florida
<class 'pandas.core.frame.DataFrame'>


In [32]:
## To read a CSV file
df1 = pd.read_csv('Energy_Production_Dataset.csv')

## To display first 5 records in a csv file
df1.head(5)

Unnamed: 0,Date,Start_Hour,End_Hour,Source,Day_of_Year,Day_Name,Month_Name,Season,Production
0,11/30/2025,21,22,Wind,334,Sunday,November,Fall,5281
1,11/30/2025,18,19,Wind,334,Sunday,November,Fall,3824
2,11/30/2025,16,17,Wind,334,Sunday,November,Fall,3824
3,11/30/2025,23,0,Wind,334,Sunday,November,Fall,6120
4,11/30/2025,6,7,Wind,334,Sunday,November,Fall,4387


In [33]:
## To display the last 5 records in a csv file
df1.tail(5)

Unnamed: 0,Date,Start_Hour,End_Hour,Source,Day_of_Year,Day_Name,Month_Name,Season,Production
51859,1/1/2020,4,5,Wind,1,Wednesday,January,Winter,2708
51860,1/1/2020,18,19,Wind,1,Wednesday,January,Winter,1077
51861,1/1/2020,7,8,Wind,1,Wednesday,January,Winter,2077
51862,1/1/2020,14,15,Solar,1,Wednesday,January,Winter,1783
51863,1/1/2020,13,14,Solar,1,Wednesday,January,Winter,2179


#### Accessing data from DataFrame

In [61]:
## Sample Data for a DataFrame

data = [{'Name': 'Ram', 'Age': 32, 'City': 'Texas'},
 {'Name': 'Seetha', 'Age': 25, 'City': 'New York'},
 {'Name': 'Raghu', 'Age': 33, 'City': 'California'},
 {'Name': 'Arjun', 'Age': 28, 'City': 'Florida'}]

index = ['a', 'b', 'c', 'd']

data_frame = pd.DataFrame(data, index=index)

data_frame


Unnamed: 0,Name,Age,City
a,Ram,32,Texas
b,Seetha,25,New York
c,Raghu,33,California
d,Arjun,28,Florida


In [69]:
## Selecting only 'Name' column from the DataFrame
print(data_frame['Name'])
print(type(data_frame['Name']))

## Note: Since, we selected only 1 column from the DataFrame, it became one-dimensional and so it got ocnverted to a series


a       Ram
b    Seetha
c     Raghu
d     Arjun
Name: Name, dtype: object
<class 'pandas.core.series.Series'>


In [71]:
data_frame

Unnamed: 0,Name,Age,City
a,Ram,32,Texas
b,Seetha,25,New York
c,Raghu,33,California
d,Arjun,28,Florida


In [67]:
## Accesssing the DataFrame using a row/column label based index

print(data_frame.loc['a'])
print()
print(data_frame.loc['b'])


Name      Ram
Age        32
City    Texas
Name: a, dtype: object

Name      Seetha
Age           25
City    New York
Name: b, dtype: object


In [70]:
## Accessing the DataFrame using position based index

## Accessing 3rd record
print(data_frame.iloc[2])
print()

## Accessing the Name of the 3rd record
print(data_frame.iloc[2][0])

Name         Raghu
Age             33
City    California
Name: c, dtype: object

Raghu


  print(data_frame.iloc[2][0])


#### Difference between loc and iloc

loc is label‑based indexing, while iloc is integer‑position‑based indexing on a pandas DataFrame

loc:

    Uses row/column labels (index names and column names), not their numeric positions

    Example: if df has index ['a','b','c'], df.loc['a','col1'] selects the value at label 'a' and column 'col1'

iloc:

    Uses 0‑based integer positions for rows and columns, regardless of their labels

    Example: with any index labels, df.iloc[0, 1] always means “first row, second column by position”



In [72]:
data_frame

Unnamed: 0,Name,Age,City
a,Ram,32,Texas
b,Seetha,25,New York
c,Raghu,33,California
d,Arjun,28,Florida


In [76]:
## Accessing a specific element using at

print(data_frame.at['a', 'Name'])
print()
print(data_frame.at['c', 'Age'])

Ram

33


In [80]:
## Accessing a specific element using iat

print(data_frame.iat[0,0])
print()
print(data_frame.iat[2,1])

Ram

33


In [81]:
data_frame

Unnamed: 0,Name,Age,City
a,Ram,32,Texas
b,Seetha,25,New York
c,Raghu,33,California
d,Arjun,28,Florida


In [95]:
## Data Manipulation with DataFrame

## Adding a new salary column

data_frame['Salary']=[50000, 85000, 120000, 200000]
data_frame

Unnamed: 0,Name,Age,City,Salary
a,Ram,32,Texas,50000
b,Seetha,25,New York,85000
c,Raghu,33,California,120000
d,Arjun,28,Florida,200000


In [87]:
## To remove the salary column

data_frame.drop('Salary')

## this gives an error because, by default, the axis=0 which indicates to check the row index for Salary attribute
## But in our data, Salary attribute is in column index. So, to change it,  we need to set axis=1

KeyError: "['Salary'] not found in axis"

In [89]:
## Setting axis=1 for column index check

data_frame.drop('Salary', axis=1)

## Here the drop operation is successfull, but this is not permanent

Unnamed: 0,Name,Age,City
a,Ram,32,Texas
b,Seetha,25,New York
c,Raghu,33,California
d,Arjun,28,Florida


In [91]:
## To check whether the Salary attribute has been deleted permanently

data_frame

Unnamed: 0,Name,Age,City,Salary
a,Ram,32,Texas,50000
b,Seetha,25,New York,85000
c,Raghu,33,California,120000
d,Arjun,28,Florida,200000


In [96]:
## To permanently delete the Salary attribute

data_frame.drop('Salary', axis=1, inplace=True)
data_frame

Unnamed: 0,Name,Age,City
a,Ram,32,Texas
b,Seetha,25,New York
c,Raghu,33,California
d,Arjun,28,Florida


In [97]:
## To update the Age after an year

data_frame['Age'] = data_frame['Age']+1

In [98]:
data_frame

Unnamed: 0,Name,Age,City
a,Ram,33,Texas
b,Seetha,26,New York
c,Raghu,34,California
d,Arjun,29,Florida


In [100]:
## To drop a record based on row index

data_frame.drop('a')

## Note: This is not permanent, to make it permanent, we should set inplace=True

Unnamed: 0,Name,Age,City
b,Seetha,26,New York
c,Raghu,34,California
d,Arjun,29,Florida


In [104]:
## Display the datatypes of each column

print('DataTypes:\n',data_frame.dtypes)

DataTypes:
 Name    object
Age      int64
City    object
dtype: object


In [106]:
## Describe the statistical summary of the DataFrame

print('Statistical Summary:\n', data_frame.describe())

Statistical Summary:
              Age
count   4.000000
mean   30.500000
std     3.696846
min    26.000000
25%    28.250000
50%    31.000000
75%    33.250000
max    34.000000
