# Working with Data using Pandas

**Definition** - Pandas is a powerful and flexible open source data analysis and manipulation library for Python provides data structures like Series(One-Dimension) and Dataframe (two-dimensional) that are efficient for handling large datasets. Pandas allow for data manipulation, aggregation and merging.

**Use Case in Real Life** - Pandas can be used in various data analysis scenario, such as customer data analysis, and marketing campaign analysis.

# Creating a DataFrame from a dictionary

In [1]:
import pandas as pd

# Creating a dataframe from a dictionary
data = {
    'Name' : ['Alice', 'Bob', 'Charlie'],
    'Age' : [25, 30, 35],
    'City' : ['New York', 'London', 'Paris']
}

df = pd.DataFrame(data)
print(df)

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris


# Creating a DataFrame from a List of Dictionaries

In [2]:
data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'London'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Paris'}
]
df = pd.DataFrame(data)
print(df)

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris


# Creating a DataFrame from a CSV File

In [3]:
# Assuming 'data.csv' is a CSV file in the current directory
df = pd.read_csv('dataset.csv')
print(df)

      Name  Age           City
0    Alice   25       New york
1      Bob   30    Los Angeles
2  Charlie   35        Chicago
3    David   40        Houston
4      Eve   28  San Francisco


# Viewing Data

In [4]:
# Displaying the first few rows
print(df.head())

      Name  Age           City
0    Alice   25       New york
1      Bob   30    Los Angeles
2  Charlie   35        Chicago
3    David   40        Houston
4      Eve   28  San Francisco


In [5]:
# Displaying the last few rows
print(df.tail())

      Name  Age           City
0    Alice   25       New york
1      Bob   30    Los Angeles
2  Charlie   35        Chicago
3    David   40        Houston
4      Eve   28  San Francisco


In [6]:
# getting information about the dataframe
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes
None


In [7]:
# Descriptive statistics
print(df.describe())

            Age
count   5.00000
mean   31.60000
std     5.94138
min    25.00000
25%    28.00000
50%    30.00000
75%    35.00000
max    40.00000


# Selecting columns

In [9]:
df

Unnamed: 0,Name,Age,City
0,Alice,25,New york
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston
4,Eve,28,San Francisco


In [8]:
# Selecting a single column
print(df['Name'])

0      Alice
1        Bob
2    Charlie
3      David
4        Eve
Name: Name, dtype: object


In [10]:
# Selecting multiple columns
print(df[['Name', 'City']])

      Name           City
0    Alice       New york
1      Bob    Los Angeles
2  Charlie        Chicago
3    David        Houston
4      Eve  San Francisco


# Filtering Rows

In [11]:
# Filtering rows based on a condition
print(df[df['Age'] > 30])

      Name  Age     City
2  Charlie   35  Chicago
3    David   40  Houston


In [12]:
df

Unnamed: 0,Name,Age,City
0,Alice,25,New york
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston
4,Eve,28,San Francisco


# Adding new Columns

In [13]:
# Adding a new Column
df['Salary'] = [50000, 60000, 70000, 80000, 90000]
print(df)

      Name  Age           City  Salary
0    Alice   25       New york   50000
1      Bob   30    Los Angeles   60000
2  Charlie   35        Chicago   70000
3    David   40        Houston   80000
4      Eve   28  San Francisco   90000


# Modifying Existing Columns

In [14]:
# Modifying an existing column
df['Age'] = df['Age'] + 1
print(df)

      Name  Age           City  Salary
0    Alice   26       New york   50000
1      Bob   31    Los Angeles   60000
2  Charlie   36        Chicago   70000
3    David   41        Houston   80000
4      Eve   29  San Francisco   90000


# Dropping columns and rows

In [15]:
# Dropping a column
df = df.drop(columns=['Salary'])
print(df)

      Name  Age           City
0    Alice   26       New york
1      Bob   31    Los Angeles
2  Charlie   36        Chicago
3    David   41        Houston
4      Eve   29  San Francisco


In [16]:
# Dropping a row
df = df.drop(index=1)
print(df)

      Name  Age           City
0    Alice   26       New york
2  Charlie   36        Chicago
3    David   41        Houston
4      Eve   29  San Francisco


# Grouping data

In [17]:
# Grouping data by a column
grouped  = df.groupby('City')
print(grouped['Age'].mean())

City
Chicago          36.0
Houston          41.0
New york         26.0
San Francisco    29.0
Name: Age, dtype: float64


# Aggregating Data

In [19]:
# Aggregating data using multiple functions
aggregated = df.groupby('City').agg({'Age': ['mean', 'min', 'max']})
print(aggregated)

                Age        
               mean min max
City                       
Chicago        36.0  36  36
Houston        41.0  41  41
New york       26.0  26  26
San Francisco  29.0  29  29


# Merging dataframes

In [23]:
df1 = pd.DataFrame({
    'ID': [1,2,3],
    'Name': ['Alice', 'Bob', 'Charlie']
})

df2 = pd.DataFrame({
    'ID': [1,2,4],
    'City': ['London', 'Paris', 'New York']
})

merged = pd.merge(df1, df2, on='ID', how="inner")
print(merged)

merged_df = pd.merge(df1, df2, on='ID', how="outer")
print(merged_df)


   ID   Name    City
0   1  Alice  London
1   2    Bob   Paris
   ID     Name      City
0   1    Alice    London
1   2      Bob     Paris
2   3  Charlie       NaN
3   4      NaN  New York


# Joining DataFrames

In [25]:
df1 = pd.DataFrame({'Name' : ['Alice', 'Bob', 'Charlie'], 'Age' : [25, 30, 35]}, index = [0,1,2])
df2 = pd.DataFrame({'City' : ['London', 'Paris', 'New York']}, index = [0,2,4])

joined = df1.join(df2, how="left")
print(joined)

      Name  Age    City
0    Alice   25  London
1      Bob   30     NaN
2  Charlie   35   Paris
