# Introduction to Pandas DataFrame and Series

## Pandas DataFrame and Series â€“ Data Structure
A **Series** is a one-dimensional array-like object containing an array of data and an associated array of labels (index).
A **DataFrame** is a two-dimensional, size-mutable, tabular data structure with labeled axes (rows and columns).

In [None]:
import pandas as pd

# Creating a Series
data = [10, 20, 30, 40]
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print("Series:")
print(series)

# Creating a DataFrame
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data_dict)
print("\nDataFrame:")
print(df)

## Importing Data into DataFrames
We can read data from different file formats such as CSV, Excel, SQL, etc. Below is an example of reading a CSV file.

In [None]:
# Example: Importing data from CSV (for demonstration we will create CSV using pandas)
sample_data = {
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [1200, 800, 300],
    'Quantity': [5, 10, 15]
}
sample_df = pd.DataFrame(sample_data)
sample_df.to_csv('sample.csv', index=False)

# Now import it
df_csv = pd.read_csv('sample.csv')
print("DataFrame from CSV:")
print(df_csv)

## Indexing DataFrames
Pandas provides multiple ways to index data: by column name, by row index, or using `.loc` and `.iloc` methods.

In [None]:
# Column indexing
print("Prices Column:")
print(df_csv['Price'])

# Row indexing using iloc
print("\nFirst Row:")
print(df_csv.iloc[0])

# Row indexing using loc
print("\nRow with index 1:")
print(df_csv.loc[1])

## Basic Operations on DataFrames
Let's perform some common operations like renaming columns, subsetting, and filtering data.

In [None]:
# Renaming Columns
df_csv_renamed = df_csv.rename(columns={'Product': 'Item', 'Price': 'Cost'})
print("Renamed DataFrame:")
print(df_csv_renamed)

# Subsetting
subset = df_csv_renamed[['Item', 'Cost']]
print("\nSubset DataFrame:")
print(subset)

# Filtering
filtered = df_csv_renamed[df_csv_renamed['Cost'] > 500]
print("\nFiltered DataFrame (Cost > 500):")
print(filtered)

## Data Imputation in DataFrame
Data imputation is the process of replacing missing values with substituted values. We can use methods like fillna or interpolate.

In [None]:
# Example DataFrame with missing values
import numpy as np
data_with_nan = {
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, np.nan, 8],
    'C': [9, 10, 11, 12]
}
df_nan = pd.DataFrame(data_with_nan)
print("DataFrame with NaN values:")
print(df_nan)

# Impute missing values with mean
df_filled = df_nan.fillna(df_nan.mean())
print("\nDataFrame after Mean Imputation:")
print(df_filled)