<a href="https://colab.research.google.com/github/aneelkumar18/PANDAS/blob/main/Aneelkumar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**1. Getting Familiar with Pandas**

   Pandas is a powerful data manipulation and analysis library for Python. It provides two primary data structures: Series and DataFrame. Here's how you can get started with them.

**Creating a Series**:

A Series is a one-dimensional array-like structure with labeled indices.



In [1]:
import pandas as pd

# Creating a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("Series:\n", series)

# Creating a Series with custom index
series_with_index = pd.Series(data, index=['a', 'b', 'c', 'd', 'e'])
print("\nSeries with Custom Index:\n", series_with_index)

Series:
 0    10
1    20
2    30
3    40
4    50
dtype: int64

Series with Custom Index:
 a    10
b    20
c    30
d    40
e    50
dtype: int64


**Creating a DataFrame**:

A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure.

In [5]:
import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Aneel', 'Bharath', 'Charan'],
    'Age': [25, 30, 35],
    'City': ['sklm', 'vzm', 'vskp']
}
df = pd.DataFrame(data)
print("DataFrame:\n", df)

DataFrame:
       Name  Age  City
0    Aneel   25  sklm
1  Bharath   30   vzm
2   Charan   35  vskp


**2. Data Handling with Pandas**

**Handling Missing Data**:

You can handle missing data by filling it or dropping it.


In [6]:
import pandas as pd
import numpy as np

# Creating a DataFrame with missing values
data = {
    'Name': ['Aneel', 'Bharath', 'Charan', np.nan],
    'Age': [25, np.nan, 35, 40],
    'City': ['sklm', 'vzm', np.nan, 'vskp']
}
df = pd.DataFrame(data)

# Filling missing values
df_filled = df.fillna({'Name': 'Unknown', 'Age': df['Age'].mean()})
print("\nDataFrame with Missing Values Filled:\n", df_filled)

# Dropping rows with missing values
df_dropped = df.dropna()
print("\nDataFrame with Missing Values Dropped:\n", df_dropped)


DataFrame with Missing Values Filled:
       Name        Age  City
0    Aneel  25.000000  sklm
1  Bharath  33.333333   vzm
2   Charan  35.000000   NaN
3  Unknown  40.000000  vskp

DataFrame with Missing Values Dropped:
     Name   Age  City
0  Aneel  25.0  sklm


**Removing Duplicates:**

In [7]:
import pandas as pd

# Creating a DataFrame with duplicate rows
data = {
    'Name': ['Aneel', 'Bharath', 'Aneel', 'Charan'],
    'Age': [25, 30, 25, 35]
}
df = pd.DataFrame(data)

# Removing duplicate rows
df_unique = df.drop_duplicates()
print("\nDataFrame with Duplicates Removed:\n", df_unique)


DataFrame with Duplicates Removed:
       Name  Age
0    Aneel   25
1  Bharath   30
3   Charan   35


**Data Type Conversions:**

In [9]:
import pandas as pd

# Creating a DataFrame with mixed data types
data = {
    'Age': ['25', '30', '35', '40']
}
df = pd.DataFrame(data)

# Converting data type
df['Age'] = df['Age'].astype(float)
print("\nDataFrame with Converted Data Types:\n", df)


DataFrame with Converted Data Types:
     Age
0  25.0
1  30.0
2  35.0
3  40.0


**3. Data Analysis with Pandas**

Generating Summary Statistics like mean,median,standard deviation,minimum and maximum values in the data and percentiles.

In [10]:
import pandas as pd

# Creating a DataFrame
data = {
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Summary statistics
print("\nSummary Statistics:\n", df.describe())



Summary Statistics:
              Age        Salary
count   5.000000      5.000000
mean   35.000000  70000.000000
std     7.905694  15811.388301
min    25.000000  50000.000000
25%    30.000000  60000.000000
50%    35.000000  70000.000000
75%    40.000000  80000.000000
max    45.000000  90000.000000


**Grouping Data and Applying Aggregates:**

In [11]:
import pandas as pd

# Creating a DataFrame
data = {
    'Department': ['HR', 'IT', 'IT', 'HR', 'Finance'],
    'Salary': [50000, 60000, 65000, 55000, 70000]
}
df = pd.DataFrame(data)

# Group by 'Department' and calculate the mean salary
grouped = df.groupby('Department').mean()
print("\nGrouped Data and Aggregates:\n", grouped)


Grouped Data and Aggregates:
              Salary
Department         
Finance     70000.0
HR          52500.0
IT          62500.0


**Merging, Joining, and Concatenating DataFrames:**

In [15]:
import pandas as pd

# Creating DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Aneel', 'Bharath', 'Charan']
})

df2 = pd.DataFrame({
    'ID': [4, 5],
    'Name': ['mohan', 'lohith']
})
# Concatenating DataFrames
concatenated_df = pd.concat([df1, df2], ignore_index=True)
print("\nConcatenated DataFrame:\n", concatenated_df)



Concatenated DataFrame:
    ID     Name
0   1    Aneel
1   2  Bharath
2   3   Charan
3   4    mohan
4   5   lohith


**4. Application in Data Science**

**Advantages of Using Pandas**:

**Efficient Data Handling**: Pandas provides powerful and flexible tools for manipulating numerical tables and time series data.

**Data Cleaning:** It simplifies data cleaning processes like handling missing data, removing duplicates, and type conversion.

**Data Analysis:** Functions like groupby, pivot_table, and various aggregation methods make data analysis efficient and intuitive.
Real-World Examples:

**Data Cleaning:** Pandas is extensively used in data cleaning tasks, preparing raw data for analysis or modeling.

**Exploratory Data Analysis (EDA):** During EDA, Pandas helps in understanding data distributions, relationships, and outliers.

**Financial Analysis:** Financial analysts use Pandas to manipulate large datasets, calculate metrics, and generate reports.

**Machine Learning:** Pandas is often used for preprocessing data before applying machine learning models, including tasks like feature engineering and data splitting.