# Level 3: Data Loading & Exporting

Being able to get data into and out of Pandas is the first step in any data analysis workflow. This level covers the most common ways to read and write data from various file formats.

In [1]:
import pandas as pd

## 3.1 Reading Data

### Reading CSV Files
CSV (Comma-Separated Values) files are one of the most common formats for storing tabular data. Use `pd.read_csv()`.

In [2]:
df_csv = pd.read_csv('../data/sample.csv')
print("Data from CSV:")
print(df_csv)

Data from CSV:
      Name   Age         City
0    Alice  25.0     New York
1      Bob  30.0  Los Angeles
2  Charlie  35.0      Chicago
3    David   NaN      Houston
4      Eva  22.0      Phoenix


### Reading Excel Files
To read Excel files, use `pd.read_excel()`. You might need to install `openpyxl` first (`pip install openpyxl`).

In [3]:
df_excel = pd.read_excel('../data/sample.xlsx')
print("Data from Excel:")
print(df_excel)

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

### Reading JSON Files
For JSON files, use `pd.read_json()`.

In [None]:
df_json = pd.read_json('../data/sample.json')
print("Data from JSON:")
print(df_json)

### Handling File Properties
The read functions have many useful parameters. For example:
- `sep`: Specify the delimiter (e.g., `sep='\t'` for tab-separated).
- `header`: Specify which row to use as the header.
- `names`: Provide a list of column names.
- `encoding`: Specify the file encoding (e.g., 'utf-8', 'latin1').

## 3.2 Writing Data

You can easily save your DataFrames to various formats.

In [None]:
df_to_write = df_csv.copy()
df_to_write['NewColumn'] = [1, 2, 3, 4, 5]

### Writing to CSV
Use `.to_csv()`. A common parameter is `index=False` to avoid writing the DataFrame index as a column.

In [None]:
df_to_write.to_csv('../data/output.csv', index=False)
print("DataFrame saved to output.csv")

### Writing to Excel
Use `.to_excel()`.

In [None]:
df_to_write.to_excel('../data/output.xlsx', index=False, sheet_name='MyData')
print("DataFrame saved to output.xlsx")

### Writing to JSON
Use `.to_json()`.

In [None]:
df_to_write.to_json('../data/output.json', orient='records', indent=4)
print("DataFrame saved to output.json")

## 3.3 Data Inspection

After loading data, it's crucial to inspect it to understand its structure and content.

In [None]:
df = df_csv
print("Using this DataFrame for inspection:")
print(df)

### `.head()`, `.tail()`, `.sample()` (Recap)
These are the quickest ways to get a glimpse of your data.

In [None]:
df.head(2)

### `.nunique()` and `.value_counts()`

**.nunique():** Returns the number of unique values in each column.

In [None]:
df.nunique()

**.value_counts():** Returns a Series containing counts of unique values in a specific column. Very useful for categorical data.

In [None]:
df['City'].value_counts()

### `.isna()` and `.notna()` for Missing Data

These methods are essential for identifying missing data (represented as `NaN` or `None`).

**.isna():** Returns a boolean DataFrame of the same shape, indicating `True` for missing values.

In [None]:
df.isna()

To get a count of missing values per column, you can chain it with `.sum()`.

In [None]:
df.isna().sum()