#Pandas
###Pandas is an open-source Python library that provides high-performance, easy-to-use data structures, and data analysis tools.

###The two primary data structures of pandas :
####Series (1-dimensional)
####DataFrame (2-dimensional)

##Series

A pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, float, string, etc.).

1.	**Labeled Indexing**: Each element in a Series has an associated label called an index, which allows for efficient data access and manipulation.
2.	**Homogeneous Data**: All elements in a Series must be of the same data type, unlike lists in Python.
3.	**Vectorized Operations**: Series supports vectorized operations, making it efficient for numerical computations.


In [27]:
import pandas as pd

In [29]:
#list
data = [1, 2, 3, 4, 5]
data

[1, 2, 3, 4, 5]

In [32]:
series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [34]:
pd.Series(['maths','science', 'English'])

0      maths
1    science
2    English
dtype: object

#DataFrame

A DataFrame is a table-like data structure in Python's Pandas library, organizing data into rows and columns for easy analysis and manipulation.


1.	**Tabular Structure:** DataFrames organize data in a tabular format, with rows representing observations or samples and columns representing variables or features.
2.	**Labeled Axes:** Both rows and columns of a DataFrame are labeled, allowing for easy and intuitive indexing and selection of data.
3.	**Mixed Data Types:** Each column in a DataFrame can hold data of different types (e.g., integers, floats, strings, booleans), providing flexibility in handling diverse datasets.
4.	**Size Mutability:** DataFrames are mutable, meaning you can modify their size by adding or removing rows and columns dynamically.
5.	**Rich Functionality:** Pandas provides a wide range of functions and methods for data manipulation, cleaning, transformation, analysis, and visualization on DataFrames.


In [54]:
#Creating a DataFrame
data = {'Name': ['Rajesh', 'Spandana', 'Rohan', 'Harshitha'], 'Age' : [21, 20, 23, 22], 'City' : ['hyderabad', 'Chennai', 'Tamilnadu', 'Delhi']}
df = pd.DataFrame(data)
df
default_age = 'Bangalore'
data['Default Age'] = default_city

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City,Default Age
0,Rajesh,21,hyderabad,Bangalore
1,Spandana,20,Chennai,Bangalore
2,Rohan,23,Tamilnadu,Bangalore
3,Harshitha,22,Delhi,Bangalore


In [59]:
import pandas as pd

data = {'Name': ['Rajesh', 'Spandana', 'Rohan', 'Harshitha'],
        'Age' : [21, 20, 23, 22],
        'City' : ['Hyderabad', 'Chennai', 'Tamilnadu', 'Delhi']}

# Adding a new column with default city
default_city = 'Bangalore'
data['Default City'] = default_city

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City,Default City
0,Rajesh,21,Hyderabad,Bangalore
1,Spandana,20,Chennai,Bangalore
2,Rohan,23,Tamilnadu,Bangalore
3,Harshitha,22,Delhi,Bangalore


In [39]:
#Adding a new columns
df['Gender'] = ['Female', 'Male', 'Male', 'Female']
df

Unnamed: 0,Name,Age,City,Gender
0,Rajesh,21,hyderabad,Female
1,Spandana,20,Chennai,Male
2,Rohan,23,Tamilnadu,Male
3,Harshitha,22,Delhi,Female


In [42]:
#removing a column
df.drop(columns=['Age'], inplace=True)
df

Unnamed: 0,Name,City,Gender
0,Rajesh,hyderabad,Female
1,Spandana,Chennai,Male
2,Rohan,Tamilnadu,Male
3,Harshitha,Delhi,Female


In [43]:
new_df=pd.concat([df,df])
new_df

Unnamed: 0,Name,City,Gender
0,Rajesh,hyderabad,Female
1,Spandana,Chennai,Male
2,Rohan,Tamilnadu,Male
3,Harshitha,Delhi,Female
0,Rajesh,hyderabad,Female
1,Spandana,Chennai,Male
2,Rohan,Tamilnadu,Male
3,Harshitha,Delhi,Female


##Reset Index

In [46]:
new_df.reset_index(drop=True)

Unnamed: 0,Name,City,Gender
0,Rajesh,hyderabad,Female
1,Spandana,Chennai,Male
2,Rohan,Tamilnadu,Male
3,Harshitha,Delhi,Female
4,Rajesh,hyderabad,Female
5,Spandana,Chennai,Male
6,Rohan,Tamilnadu,Male
7,Harshitha,Delhi,Female


##Drop  Duplicates

In [47]:
new_df1=new_df.drop_duplicates()
new_df1

Unnamed: 0,Name,City,Gender
0,Rajesh,hyderabad,Female
1,Spandana,Chennai,Male
2,Rohan,Tamilnadu,Male
3,Harshitha,Delhi,Female


Unnamed: 0,Name,City
0,Rajesh,hyderabad
1,Spandana,Chennai
2,Rohan,Tamilnadu
3,Harshitha,Delhi


##Drop Columns

In [49]:
new_df1.drop(['Gender'], axis=1)

Unnamed: 0,Name,City
0,Rajesh,hyderabad
1,Spandana,Chennai
2,Rohan,Tamilnadu
3,Harshitha,Delhi


## Handling Null Values

In [50]:
new_df1.isnull().sum()

Name      0
City      0
Gender    0
dtype: int64

In [51]:
new_df1.dropna()

Unnamed: 0,Name,City,Gender
0,Rajesh,hyderabad,Female
1,Spandana,Chennai,Male
2,Rohan,Tamilnadu,Male
3,Harshitha,Delhi,Female


##Drop Null values

In [52]:
new_df1.dropna()

Unnamed: 0,Name,City,Gender
0,Rajesh,hyderabad,Female
1,Spandana,Chennai,Male
2,Rohan,Tamilnadu,Male
3,Harshitha,Delhi,Female


##Vertical Concatenation

In [55]:
df1=pd.DataFrame({'A': [1, 2], 'B': [4, 5]})
df2=pd.DataFrame({'A': [6, 7], 'B': [8, 9]})

In [56]:
df3=pd.concat([df1, df2], axis=0)
df3

Unnamed: 0,A,B
0,1,4
1,2,5
0,6,8
1,7,9


##Horizontal Concatenation

In [57]:
df1=pd.DataFrame({'A': [1, 2], 'B': [4, 5]})
df2=pd.DataFrame({'A': [6, 7], 'B': [8, 9]})

In [58]:
df3=pd.concat([df1, df2], axis=1)
df3

Unnamed: 0,A,B,A.1,B.1
0,1,4,6,8
1,2,5,7,9
