Data Processing with Pandas

Introduction

This project demonstrates how to read and analyze various file formats using the Python pandas library. It covers reading CSV, JSON, and Excel files, displaying data, and performing basic operations such as checking data types, counting values, and accessing specific rows and columns.

Pandas is a powerful data analysis and manipulation tool that makes it easy to work with structured data. This project provides an overview of how to import different data formats and perform basic operations to inspect the dataset.

Data Import and Inspection

The script starts by checking the installed pandas version:

import pandas as pd
print(pd.__version__)

Next, the script reads various file formats:

# Reading a CSV file
df = pd.read_csv('/Users/mohamed/Desktop/Data Science/Ice Cream Ratings.csv')
df

CSV files are commonly used for storing tabular data. The read_csv function allows easy access to structured information.

# Reading a tab-separated text file
df2 = pd.read_csv('/Users/mohamed/Desktop/Data Science/countries of the world.txt', sep='\t')
df2

The sep='\t' parameter is used because the text file is tab-separated.

# Reading a JSON file
js = pd.read_json('/Users/mohamed/Desktop/Data Science/json_sample.json')
js

JSON is a common format for storing structured data, often used in APIs.

# Reading an Excel file
dd = pd.read_excel('/Users/mohamed/Desktop/Data Science/world_population_excel_workbook.xlsx', sheet_name='Sheet1')
dd

The read_excel function allows for reading spreadsheets, which can contain multiple sheets.

Data Exploration

Once the files are loaded, various commands help in exploring the data:

# Set display options
pd.set_option('display.max_rows', 20)
dd

This limits the number of displayed rows to avoid excessive output.

dd.info()
dd.shape  # Returns the number of rows and columns
dd.size   # Total number of elements in the DataFrame
dd.value_counts()  # Count unique occurrences of values
dd.dtypes  # Check data types of each column

These functions provide an overview of the dataset, including its structure and data types.

dd.head(20)  # Display the first 20 rows
dd.tail(10)  # Display the last 10 rows

These functions help quickly inspect the beginning and end of the dataset.

Conclusion

This project demonstrates how to load and explore datasets using pandas. By importing different file formats and utilizing various functions, users can efficiently analyze and manipulate data. Pandas simplifies working with structured data, making it an essential tool for data science and analytics.

Understanding these basic operations provides a solid foundation for more advanced data analysis tasks such as filtering, grouping, and visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Pandas.ipynb		Pandas.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Processing with Pandas

Introduction

Data Import and Inspection

Data Exploration

Conclusion

About

Uh oh!

Releases

Packages

Languages

Eng-GB/Data-Processing-with-Pandas

Folders and files

Latest commit

History

Repository files navigation

Data Processing with Pandas

Introduction

Data Import and Inspection

Data Exploration

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages