
# 📊 Read & Export Data Like a Pro with Pandas

**Hook:** Dealing with CSVs or Excel? Here’s how to read, clean, and save your data with pandas — no tech headache involved. 🚀

This notebook will guide you through the essential steps of data handling using the powerful pandas library in Python. From loading your data to cleaning it and finally exporting it, we'll cover everything you need to know to become a data pro!


---
## 📘 Author Information

**👨‍💻 Name:** Abdul Rehman  
**📌 Role:** Data Science Enthusiast | Python Learner  
**📅 Notebook Created:** July 2025  

**🔗 Connect with Me:**  


[![LinkedIn](https://img.shields.io/badge/LinkedIn-blue?style=flat&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/abdul-rehman-74b418350/)
[![GitHub](https://img.shields.io/badge/GitHub-black?style=flat&logo=github&logoColor=white)](https://github.com/datawithrehman/Data-Science-Beginning)
[![Twitter](https://img.shields.io/badge/Twitter-blue?style=flat&logo=twitter&logoColor=white)](https://x.com/datawithrehman)





## 📥 Reading Files: CSV and Excel

The first step in any data analysis workflow is to load your data. Pandas provides intuitive functions to read various file formats, including CSV (Comma Separated Values) and Excel files.

- `pd.read_csv()`: For reading data from CSV files.
- `pd.read_excel()`: For reading data from Excel files.

Let's see how it works!


In [None]:

import pandas as pd

# Create dummy CSV and Excel files for demonstration
# In a real scenario, you would replace these with your actual file paths.

dummy_data_csv = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [24, 27, 22, 32, 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df_dummy_csv = pd.DataFrame(dummy_data_csv)
df_dummy_csv.to_csv('dummy_data.csv', index=False)

dummy_data_excel = {
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
    'Price': [1200, 25, 75, 300, 50],
    'Quantity': [10, 50, 30, 5, 20]
}
df_dummy_excel = pd.DataFrame(dummy_data_excel)
df_dummy_excel.to_excel('dummy_data.xlsx', index=False)

print("Dummy 'dummy_data.csv' created.")
print("Dummy 'dummy_data.xlsx' created.")

# Now, let's read these files using pandas

try:
    df_csv = pd.read_csv('dummy_data.csv')
    print("
Successfully read 'dummy_data.csv':")
    print(df_csv.head())
except FileNotFoundError:
    print("Error: 'dummy_data.csv' not found. Please ensure the file exists.")

try:
    df_excel = pd.read_excel('dummy_data.xlsx')
    print("
Successfully read 'dummy_data.xlsx':")
    print(df_excel.head())
except FileNotFoundError:
    print("Error: 'dummy_data.xlsx' not found. Please ensure the file exists.")



## 🔍 Previewing Your Data

After loading your data, it's crucial to get a quick overview of its structure and content. Pandas DataFrames offer several methods for this:

- `.head()`: Displays the first 5 rows of the DataFrame (or a specified number).
- `.info()`: Provides a concise summary of the DataFrame, including data types, non-null values, and memory usage.
- `.shape`: Returns a tuple representing the dimensions of the DataFrame (rows, columns).

Let's inspect our loaded dataframes.


In [None]:

print("
--- Previewing df_csv ---")
print(df_csv.head())
print("
")
print(df_csv.info())
print("
Shape of df_csv:", df_csv.shape)

print("
--- Previewing df_excel ---")
print(df_excel.head())
print("
")
print(df_excel.info())
print("
Shape of df_excel:", df_excel.shape)



## 🎯 Selecting Columns During Import

Sometimes, you only need a subset of columns from a large dataset. Importing only the necessary columns can save memory and speed up your data loading process. You can achieve this using the `usecols` parameter in `read_csv()` and `read_excel()`.

This is especially useful for very wide datasets where you're only interested in a few specific features.


In [None]:

# Let's say we only need 'Name' and 'City' from our dummy_data.csv
df_selected_cols = pd.read_csv('dummy_data.csv', usecols=['Name', 'City'])

print("
DataFrame with selected columns ('Name', 'City'):")
print(df_selected_cols.head())

# You can also select columns by their integer position (0-indexed)
# For example, to select the first two columns (Name and Age)
df_selected_by_pos = pd.read_csv('dummy_data.csv', usecols=[0, 1])

print("
DataFrame with selected columns by position (0, 1):")
print(df_selected_by_pos.head())



## ✨ Cleaning Data: Handling Missing Values

Real-world data is often messy and contains missing values (represented as `NaN` - Not a Number in pandas). Handling these missing values is a crucial step in data cleaning. A common approach is to remove rows or columns that contain `NaN` values using the `.dropna()` method.

**Important Note:** Dropping missing values should be done carefully, as it can lead to loss of valuable data. For this beginner-friendly notebook, we'll use a simple `dropna()` for demonstration. In more advanced scenarios, you might consider imputation (filling missing values with estimated ones).


In [None]:

# Let's create a dummy DataFrame with some missing values
data_with_nulls = {
    'Student_ID': [1, 2, 3, 4, 5],
    'Math_Score': [85, 92, None, 78, 95],
    'Science_Score': [70, None, 88, 91, 80],
    'English_Score': [90, 85, 75, None, 92]
}
df_marks = pd.DataFrame(data_with_nulls)

print("
Original DataFrame with missing values:")
print(df_marks)

# Clean nulls: Drop rows with any missing values
df_cleaned_marks = df_marks.dropna()

print("
DataFrame after dropping rows with null values:")
print(df_cleaned_marks)



## 💾 Saving Files: Exporting Your Cleaned Data

Once you've processed and cleaned your data, you'll want to save it for future use or sharing. Pandas provides `.to_csv()` and `.to_excel()` methods for exporting DataFrames.

- `.to_csv()`: Exports the DataFrame to a CSV file.
- `.to_excel()`: Exports the DataFrame to an Excel file.

### 🚨 Mistake Alert: Unintended Index Column!

A common pitfall when saving to CSV is including the DataFrame index as a new column. By default, `to_csv()` includes the index. To prevent this, **always use `index=False`**.


In [None]:

# Save the cleaned student marks to a CSV file
# INCORRECT WAY (will include index column)
df_cleaned_marks.to_csv('student_marks_cleaned_with_index.csv')
print("Saved 'student_marks_cleaned_with_index.csv' (check this file to see the extra index column).")

# CORRECT WAY (without index column)
df_cleaned_marks.to_csv('student_marks_cleaned_no_index.csv', index=False)
print("Saved 'student_marks_cleaned_no_index.csv' (this is the recommended way!).")

# You can also save to Excel
df_cleaned_marks.to_excel('student_marks_cleaned.xlsx', index=False)
print("Saved 'student_marks_cleaned.xlsx'.")



## 🏆 Mini Challenge

Put your new skills to the test! Your challenge is to:

1.  Load a CSV file (you can use `dummy_data.csv` or create your own).
2.  Introduce some null values if your data doesn't have any (optional, for practice).
3.  Clean the null values using `dropna()`.
4.  Save the cleaned data to a new CSV file, ensuring you use `index=False`.

Feel free to modify the `dummy_data.csv` or create a new one to experiment!


In [None]:

# Your code for the Mini Challenge goes here!
# Example (you can uncomment and run this, or write your own):

# df_challenge = pd.read_csv('dummy_data.csv')
# # Optional: Introduce a null value for demonstration
# # df_challenge.loc[0, 'Age'] = None
# print("
Challenge: Original DataFrame (potentially with new nulls):")
# print(df_challenge)

# df_challenge_cleaned = df_challenge.dropna()
# print("
Challenge: Cleaned DataFrame:")
# print(df_challenge_cleaned)

# df_challenge_cleaned.to_csv('challenge_output.csv', index=False)
# print("
Challenge: Saved 'challenge_output.csv' with index=False.")



## 🎉 Conclusion

Congratulations! You've successfully learned how to read, preview, clean, and export data using pandas. These are fundamental skills for any data professional.

Remember the `index=False` trick when saving CSVs to keep your data clean and avoid unexpected columns.

Happy data wrangling! 🚀
