# Session 3: Exploring Structured Data

**Unit 1: Introduction to Data Science**
**Hour: 3**
**Mode: Practical Lab**

---

### 1. Objective

Welcome to your first hands-on lab! By the end of this session, you will be able to load a structured data file (CSV) into Python and perform basic inspections to understand its contents.

**What is Structured Data?** It's data organized in a tabular format, like a spreadsheet. CSV (Comma-Separated Values) is the most common format for structured data.

### 2. Setup: Importing the Pandas Library

Pandas is the most essential Python library for data analysis. It gives us a powerful tool called a **DataFrame**, which is perfect for working with table-like data.

We import it with the alias `pd`, which is a universal convention among data scientists.

In [None]:
import pandas as pd

### 3. Obtain: Loading the Data

We will use a simple dataset about video game sales. First, we need to create the CSV file. The code below writes a string of data to a file named `vgsales.csv`.

In [None]:
# Data as a multi-line string
csv_data = """Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
1,Wii Sports,Wii,2006,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
2,Super Mario Bros.,NES,1985,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
3,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
4,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.75,11.01,3.28,2.96,33
5,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,Nintendo,11.27,8.89,10.22,1,31.37
6,Tetris,GB,1989,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
7,New Super Mario Bros.,DS,2006,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
8,Wii Play,Wii,2006,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
9,New Super Mario Bros. Wii,Wii,2009,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
10,Duck Hunt,NES,1984,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31
"""

# Write the string to a file
with open('vgsales.csv', 'w') as f:
    f.write(csv_data)

Now that the file exists, we can use the `pd.read_csv()` function to load it into a Pandas DataFrame. We'll store it in a variable called `df` (short for DataFrame).

In [None]:
df = pd.read_csv('vgsales.csv')


### 4. Explore: Basic Inspection

The data is loaded! Now, let's perform our first inspection steps. These commands are the first thing a data scientist does after loading any new dataset.

#### 4.1. `.head()` - View the First Few Rows

This is the best way to get a quick feel for the data and see what the columns look like. By default, it shows the first 5 rows.

In [None]:
df.head()

You can also specify how many rows you want to see by passing a number inside the parentheses.

In [None]:
df.head(3) # Show the first 3 rows

#### 4.2. `.shape` - Check the Dimensions

This attribute tells you how many rows and columns are in your dataset. It returns a tuple in the format `(rows, columns)`.

In [None]:
df.shape

**Interpretation:** We have 10 rows and 11 columns.

#### 4.3. `.info()` - Get a Technical Summary

This method provides a concise summary of the DataFrame. It's extremely useful for the "Scrub" phase because it tells you:
*   The number of entries (rows).
*   The name of each column.
*   The number of non-null (i.e., not empty) values for each column.
*   The data type (Dtype) of each column (e.g., `int64` for integers, `float64` for decimals, `object` for text).
*   How much memory the DataFrame is using.

In [None]:
df.info()

**Interpretation:**
- We can see all columns have 10 non-null values, which means there is no missing data in this small sample.
- `Year` is an integer, sales are floats, and text columns like `Name` and `Platform` are `object`s. This looks correct.

### 5. Conclusion

Congratulations! You have successfully:
1.  Imported the Pandas library.
2.  Loaded a structured CSV file into a DataFrame.
3.  Performed the three most fundamental inspection steps: `.head()`, `.shape`, and `.info()`.

These simple commands are the starting point for every single data analysis project. In the next lab, we'll see how to handle a different kind of data: semi-structured JSON.