Skip to content

Eng-GB/Data-Processing-with-Pandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Data Processing with Pandas

Introduction

This project demonstrates how to read and analyze various file formats using the Python pandas library. It covers reading CSV, JSON, and Excel files, displaying data, and performing basic operations such as checking data types, counting values, and accessing specific rows and columns.

Pandas is a powerful data analysis and manipulation tool that makes it easy to work with structured data. This project provides an overview of how to import different data formats and perform basic operations to inspect the dataset.

Data Import and Inspection

The script starts by checking the installed pandas version:

import pandas as pd
print(pd.__version__)

Next, the script reads various file formats:

# Reading a CSV file
df = pd.read_csv('/Users/mohamed/Desktop/Data Science/Ice Cream Ratings.csv')
df

CSV files are commonly used for storing tabular data. The read_csv function allows easy access to structured information.

# Reading a tab-separated text file
df2 = pd.read_csv('/Users/mohamed/Desktop/Data Science/countries of the world.txt', sep='\t')
df2

The sep='\t' parameter is used because the text file is tab-separated.

# Reading a JSON file
js = pd.read_json('/Users/mohamed/Desktop/Data Science/json_sample.json')
js

JSON is a common format for storing structured data, often used in APIs.

# Reading an Excel file
dd = pd.read_excel('/Users/mohamed/Desktop/Data Science/world_population_excel_workbook.xlsx', sheet_name='Sheet1')
dd

The read_excel function allows for reading spreadsheets, which can contain multiple sheets.

Data Exploration

Once the files are loaded, various commands help in exploring the data:

# Set display options
pd.set_option('display.max_rows', 20)
dd

This limits the number of displayed rows to avoid excessive output.

dd.info()
dd.shape  # Returns the number of rows and columns
dd.size   # Total number of elements in the DataFrame
dd.value_counts()  # Count unique occurrences of values
dd.dtypes  # Check data types of each column

These functions provide an overview of the dataset, including its structure and data types.

dd.head(20)  # Display the first 20 rows
dd.tail(10)  # Display the last 10 rows

These functions help quickly inspect the beginning and end of the dataset.

Conclusion

This project demonstrates how to load and explore datasets using pandas. By importing different file formats and utilizing various functions, users can efficiently analyze and manipulate data. Pandas simplifies working with structured data, making it an essential tool for data science and analytics.

Understanding these basic operations provides a solid foundation for more advanced data analysis tasks such as filtering, grouping, and visualization.

About

This project demonstrates how to read and analyze various file formats using the Python pandas library.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published