# Session 4: Working with Data in pandas

**Objective:** Learn to manipulate and analyze data effectively using pandas.  
Documentation: [pandas Getting Started](https://pandas.pydata.org/docs/getting_started/index.html)

## 1. Importing Data from Various Sources
Pandas provides convenient methods to import data from multiple sources.

In [None]:
# Import piplite
import piplite
await piplite.install('openpyxl')

In [None]:
# Import pandas
import pandas as pd

In [None]:
# CSV File
df_csv = pd.read_csv('data.csv')  # Replace with actual file path
df_csv.head()

In [None]:
# Excel File
df_excel = pd.read_excel('data.xlsx', sheet_name='Sheet1')  # Requires openpyxl or xlrd
df_excel.head()

In [None]:
# JSON File
df_json = pd.read_json('data.json')
df_json.head()

In [None]:
# SQL Database
import sqlite3
connection = sqlite3.connect('database.db')
df_sql = pd.read_sql('SELECT * FROM table_name', connection)
df_sql.head()

## 2. Exploring Data
Basic functions to understand your data quickly.

In [None]:
# Display first rows
df_csv.head()

In [None]:
# Summary statistics
df_csv.describe()

In [None]:
# Metadata info
df_csv.info()

## 3. Cleaning and Transforming Data
Common data cleaning tasks with pandas.

In [None]:
# Fill missing values
if 'column_name' in df_csv.columns:
    df_csv['column_name'].fillna(df_csv['column_name'].mean(), inplace=True)

In [None]:
# Remove duplicates
df_csv.drop_duplicates(inplace=True)

In [None]:
# Rename columns
df_csv.rename(columns={'old_column': 'new_column'}, inplace=True)

In [None]:
# Change data types
if 'date_column' in df_csv.columns:
    df_csv['date_column'] = pd.to_datetime(df_csv['date_column'])

In [None]:
# Filter rows based on a condition
if 'column_name' in df_csv.columns:
    filtered_df = df_csv[df_csv['column_name'] > 50]
    filtered_df.head()

## Activity: Load and Clean a Dataset Using pandas

In [None]:
try:
    df = pd.read_csv('sample_data.csv')
    df.dropna(subset=['important_column'], inplace=True)
    df.rename(columns={'old_name': 'new_name'}, inplace=True)
    df = df[df['score'] > 70]
    print("Cleaned Dataset:")
    print(df.head())
except FileNotFoundError:
    print("sample_data.csv not found. Please make sure the file exists in the working directory.")