# Week 3: Importing and Handling Data with Pandas

### Objectives
- Gain a foundational understanding of the Pandas library for data manipulation.
- Learn how to load datasets from external files and perform basic data inspection tasks.

### Topics
- **Introduction to Pandas**: Explore Pandas as a key data analysis library in Python, known for its powerful data structures and data manipulation capabilities.
- **Loading Data from CSV**: Understand how to read data from CSV files, one of the most common formats for datasets.
- **Viewing Data**: Learn methods for inspecting the first few rows of data and summarizing it for initial understanding.

### Content

1. **Introduction to Pandas**
   - **Overview of Pandas Library**:
     - Pandas is a powerful, open-source data analysis and manipulation library for Python. It is built on top of NumPy and provides flexible, easy-to-use data structures for handling large datasets.
     - Pandas offers two primary data structures:
       - **Series**: A one-dimensional labeled array, similar to a list or column in a table.
       - **DataFrame**: A two-dimensional labeled data structure with columns of potentially different data types, similar to a spreadsheet or SQL table.
   - **Why Use Pandas?**:
     - Pandas simplifies many data handling and manipulation tasks:
       - **Data Cleaning**: Handle missing values, duplicates, and inconsistent data.
       - **Data Transformation**: Filter, sort, and reshape data as needed.
       - **Data Analysis**: Perform aggregations, group data, and generate basic statistics.
     - **Example**:
       ```python
       import pandas as pd
       data = pd.Series([10, 20, 30])
       print(data)  # Displays the Series
       ```

2. **Loading Data from CSV**
   - **What is a CSV File?**:
     - CSV (Comma-Separated Values) files are a common format for datasets, where each line represents a row, and each value is separated by a comma. Most datasets are available in CSV format because it’s easy to use and compatible with many software tools.
   - **Using `pd.read_csv()` to Load Data**:
     - The `pd.read_csv()` function is used to read CSV files into a DataFrame, making it accessible for analysis within Python.
     - **Syntax**: `pd.read_csv('file_path')`, where `'file_path'` is the location of the CSV file.
     - **Example**:
       ```python
       # Load a CSV file into a DataFrame
       df = pd.read_csv('data.csv')
       ```
   - **Common Parameters for `pd.read_csv()`**:
     - **`sep`**: Specifies the delimiter if it’s not a comma (e.g., tab-separated files can use `sep='\t'`).
     - **`header`**: Indicates which row to use as the column names (default is the first row).
     - **`index_col`**: Sets a column as the index for the DataFrame.
     - **Example with Parameters**:
       ```python
       df = pd.read_csv('data.csv', sep=',', header=0, index_col='id')
       ```

3. **Viewing Data**
   - **Inspecting the First Few Rows with `head()`**:
     - The `head()` method is used to view the first few rows of the DataFrame, allowing for a quick inspection of the data structure, columns, and sample values.
     - **Syntax**: `df.head(n)`, where `n` is the number of rows to view (default is 5).
     - **Example**:
       ```python
       df = pd.read_csv('data.csv')
       print(df.head())  # Displays the first 5 rows by default
       ```
   - **Checking the Structure with `info()`**:
     - The `info()` method provides a summary of the DataFrame, including the number of rows and columns, column names, data types, and any missing values.
     - **Example**:
       ```python
       df.info()
       ```
   - **Viewing Columns with `columns` Attribute**:
     - Access the `columns` attribute to list all column names, which is helpful for understanding the data and selecting specific columns for analysis.
     - **Example**:
       ```python
       print(df.columns)
       ```
   - **Checking the Shape of Data with `shape`**:
     - The `shape` attribute returns a tuple of the number of rows and columns in the DataFrame, helping gauge the size of the dataset.
     - **Example**:
       ```python
       print(df.shape)  # Output: (rows, columns)
       ```
   - **Summary Statistics with `describe()`**:
     - The `describe()` method generates summary statistics for numerical columns, such as mean, median, standard deviation, minimum, and maximum values.
     - **Example**:
       ```python
       df.describe()
       ```
   - **Example Workflow for Initial Data Inspection**:
     - A common workflow for loading and inspecting data might include:
       1. Load the dataset with `pd.read_csv()`.
       2. View the first few rows with `head()`.
       3. Check the structure with `info()`.
       4. Get summary statistics with `describe()`.

### Exercises
- **Exercise 1**: Load a sample CSV file into a DataFrame and display the first 5 rows using `head()`.
- **Exercise 2**: Use the `info()` method to display the structure of the DataFrame and list all column names with the `columns` attribute.
- **Exercise 3**: Use `describe()` to generate summary statistics for a dataset and interpret key values like mean and standard deviation.
