# EDA ( Exploratory Data Analysis )
# Pandas


* ## Install Pandas

In [None]:
pip install pandas



# Series
A Series is a one-dimensional labeled array capable of holding any data type (e.g., integers, strings, floats, Python objects, etc.). It is similar to a one-dimensional NumPy array, but with additional features such as axis labels

In [None]:
import pandas as pd


* ###  Creating Series from List

In [None]:
data = [10, 20, 30, 40, 50]
series_from_list = pd.Series(data)

print("Series created from a list:")
print(series_from_list)


Series created from a list:
0    10
1    20
2    30
3    40
4    50
dtype: int64


* ###  Creating Series from Dictionary

In [None]:
data_dict = {'a': 100, 'b': 200, 'c': 300, 'd': 400}
series_from_dict = pd.Series(data_dict)

print("Series created from a dictionary:")
print(series_from_dict)

Series created from a dictionary:
a    100
b    200
c    300
d    400
dtype: int64


* ### Accessing Elements from Series

In [None]:
print("Accessing elements in the Series:")
print("Element at index 2:", series_from_list[2])
print("Element with label 'b':", series_from_dict['b'])

Accessing elements in the Series:
Element at index 2: 30
Element with label 'b': 200


* ### Arithmetic Operation on Series

In [None]:
print("Performing operations on the Series:")
squared_series = series_from_list ** 2
print("Squared Series:")
print(squared_series)

Performing operations on the Series:
Squared Series:
0     100
1     400
2     900
3    1600
4    2500
dtype: int64


* ### Accessing Elements from Series based on Conditions

In [None]:
print("Filtering values in the Series:")
filtered_series = series_from_list[series_from_list > 30]
print("Values greater than 30:")
print(filtered_series)

Filtering values in the Series:
Values greater than 30:
3    40
4    50
dtype: int64


* ### Series Size is Mutable

In [None]:
initial_data = [10, 20, 30, 40, 50]
labels = ['a', 'b', 'c', 'd', 'e']
mutable_series = pd.Series(initial_data, index=labels)

print("Initial Series:")
print(mutable_series)
print()

mutable_series['f'] = 60
print(mutable_series)

Initial Series:
a    10
b    20
c    30
d    40
e    50
dtype: int64

a    10
b    20
c    30
d    40
e    50
f    60
dtype: int64


In [None]:
mutable_series['a'] = 100
mutable_series['b'] = 200

print("Modified Series:")
print(mutable_series)

Modified Series:
a    100
b    200
c     30
d     40
e     50
f     60
dtype: int64


# DataFrame
Pandas DataFrame is a two-dimensional labeled data structure, similar to a spreadsheet or SQL table. It consists of rows and columns, where each column can be of a different data type (e.g., integer, float, string, etc.). You can think of it as a dictionary of Series objects, where each Series represents a column.

* ### Creating a DataFrame from a dictionary

In [None]:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 28],
        'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)

print("DataFrame created from a dictionary:")
df

DataFrame created from a dictionary:


Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles
3,David,28,Chicago


* ### Accessing columns in a DataFrame


In [None]:
print("Accessing columns in the DataFrame:")
print("Name column:")
df['Name']

Accessing columns in the DataFrame:
Name column:


Unnamed: 0,Name
0,Alice
1,Bob
2,Charlie
3,David


* ### Adding a new column to the DataFrame

In [None]:
df['Salary'] = [60000, 70000, 80000, 65000]
print("DataFrame with a new 'Salary' column:")
df

DataFrame with a new 'Salary' column:


Unnamed: 0,Name,Age,City,Salary
0,Alice,25,New York,60000
1,Bob,30,San Francisco,70000
2,Charlie,35,Los Angeles,80000
3,David,28,Chicago,65000


* ### Descriptive statistics of the DataFrame


In [None]:
print("Descriptive statistics of the DataFrame:")
df.describe()

Descriptive statistics of the DataFrame:


Unnamed: 0,Age,Salary
count,4.0,4.0
mean,29.5,68750.0
std,4.203173,8539.125638
min,25.0,60000.0
25%,27.25,63750.0
50%,29.0,67500.0
75%,31.25,72500.0
max,35.0,80000.0


* ### Filtering rows in the DataFrame based on conditions


In [None]:
print("Filtering rows in the DataFrame:")
filtered_df = df[df['Age'] > 30]
print("Rows with Age greater than 30:")
filtered_df

Filtering rows in the DataFrame:
Rows with Age greater than 30:


Unnamed: 0,Name,Age,City,Salary
2,Charlie,35,Los Angeles,80000


* ### from_dict()
from_dict() is a method used to create a DataFrame from a dictionary.

In [None]:
data = { 'apples':[3, 2, 0, 1] , 'oranges':[0, 3, 7, 2] }

In [None]:
df = pd.DataFrame(data, index=['Delhi', 'Ahmedabad', 'Mumbai', 'Kolkata'])
print(df)

           col_1 col_2
Delhi          3     a
Ahmedabad      2     b
Mumbai         1     c
Kolkata        0     d


In [None]:
data = {'col_1':[3, 2, 1, 0], 'col_2':['a','b','c','d']}
pd.DataFrame.from_dict(data)

Unnamed: 0,col_1,col_2
0,3,a
1,2,b
2,1,c
3,0,d


The orient='index' parameter specifies that the keys of the dictionary should be used as the row indices, and the values should be the rows of the DataFrame.

In [None]:
data = {'row_1':[3, 2, 1, 0], 'row_2':['a','b','c','d']}
pd.DataFrame.from_dict(data,orient='index')

Unnamed: 0,0,1,2,3
row_1,3,2,1,0
row_2,a,b,c,d


In [None]:
data = {'row_1':[3, 2, 1, 0], 'row_2':['a','b','c','d']}
pd.DataFrame.from_dict(data, orient = 'index',columns = ['A','B','C','D'])

Unnamed: 0,A,B,C,D
row_1,3,2,1,0
row_2,a,b,c,d
