# PANDAS

## Definition:
pandas is a powerful and widely-used Python library for data manipulation and analysis. It provides data structures like Series and DataFrame, which are built on top of NumPy arrays, 
making it easy to work with structured data. Here's an introduction to pandas:

## Installation:

You can install pandas using the following pip command:

In [1]:
pip install pandas





[notice] A new release of pip is available: 23.1.2 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip





## Importing pandas:

After installation, you can import pandas in your Python script or Jupyter notebook:

In [2]:
import pandas as pd


## pandas Series:

A pandas Series is a one-dimensional labeled array that can hold any data type. It is similar to a NumPy array, but with additional labels or index.

In [3]:
# Creating a pandas Series from a Python list
my_list = [10, 20, 30, 40, 50]
my_series = pd.Series(my_list)
print(my_series)


0    10
1    20
2    30
3    40
4    50
dtype: int64


# Series Attributes:

Similar to NumPy arrays, pandas Series have attributes like index and values.

In [5]:
print(my_series.index)   # Returns the index of the Series
print(my_series.values)  # Returns the values of the Series


RangeIndex(start=0, stop=5, step=1)
[10 20 30 40 50]


# pandas DataFrame:

A pandas DataFrame is a two-dimensional table with rows and columns. It is one of the most commonly used structures in pandas.

In [7]:
# Creating a pandas DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
print(df)

      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles


# DataFrame Operations:

pandas provides a wide range of operations for DataFrame manipulation, including indexing, selecting, and filtering data.

In [5]:
# Selecting a column
age_column = df['Age']

# Filtering rows based on a condition
young_people = df[df['Age'] < 30]


print(young_people)


    Name  Age      City
0  Alice   25  New York


# Handling Missing Data:

pandas has built-in methods for handling missing data, such as dropna() and fillna().

In [19]:
# Dropping rows with missing values
df=pd.DataFrame({"name":['supername','name'],
                "toy":[pd.NaT,'cat']})

df_without_missing = df.dropna()


# Filling missing values with a specific value
df_filled = df.fillna('cat')

print(df_without_missing)
print(df_filled)



   name  toy
1  name  cat
        name  toy
0  supername  cat
1       name  cat


# Grouping and Aggregation:

pandas supports grouping of data based on certain criteria, and you can perform aggregations on the grouped data.

In [12]:
# Grouping by 'City' and calculating the average age
average_age_by_city = df.groupby('City')['Age'].mean()
print(average_age_by_city)


City
Los Angeles      35.0
New York         25.0
San Francisco    30.0
Name: Age, dtype: float64


# Merging and Concatenation:

pandas provides functions for merging and concatenating DataFrames, allowing you to combine data from different sources.

In [20]:
# Merging two DataFrames based on a common column
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Age': [25, 30, 35]})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)

# Concatenating DataFrames vertically
concatenated_df = pd.concat([df1, df2])
print(concatenated_df)


   ID     Name  Age
0   2      Bob   25
1   3  Charlie   30
   ID     Name   Age
0   1    Alice   NaN
1   2      Bob   NaN
2   3  Charlie   NaN
0   2      NaN  25.0
1   3      NaN  30.0
2   4      NaN  35.0


# IO Operations:

pandas supports reading and writing data in various formats, including CSV, Excel, SQL, and more.

In [None]:
# Reading data from a CSV file
data_from_csv = pd.read_csv('data.csv')

# Writing data to an Excel file
df.to_excel('output.xlsx', index=False)


In [19]:
import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)

# Writing the DataFrame to an Excel file
df.to_excel('output.xlsx', index=False)


output.xlsx


In [2]:
import pandas as pd
d=pd.read_excel(r"C:\Users\Administrator\Desktop\sample.xlsx")
df=pd.DataFrame(d)

In [3]:
df

Unnamed: 0,ID,First Name,Last Name,Hire Date,Start Salary,Email,Phone
0,3,Julie,Rhatt,2023-04-23,,rhatt@gmail.com,1234568000.0
1,4,Tom,Jerry,2023-06-04,30000.0,tom@gmail.com,3456789000.0
2,5,Cat,,2023-06-07,20000.0,cat@gmail.com,
3,6,Julie,Rhatt,2023-06-25,30000.0,rhatt@gmail.com,1234568000.0
4,7,rohit,y,NaT,20000.0,rohit@123,123456800.0


In [5]:
df.fillna('MISSING')

Unnamed: 0,ID,First Name,Last Name,Hire Date,Start Salary,Email,Phone
0,3,Julie,Rhatt,2023-04-23 00:00:00,MISSING,rhatt@gmail.com,1234567892.0
1,4,Tom,Jerry,2023-06-04 00:00:00,30000.0,tom@gmail.com,3456789123.0
2,5,Cat,MISSING,2023-06-07 00:00:00,20000.0,cat@gmail.com,MISSING
3,6,Julie,Rhatt,2023-06-25 00:00:00,30000.0,rhatt@gmail.com,1234567892.0
4,7,rohit,y,MISSING,20000.0,rohit@123,123456789.0
