This project demonstrates how to read and analyze various file formats using the Python pandas library. It covers reading CSV, JSON, and Excel files, displaying data, and performing basic operations such as checking data types, counting values, and accessing specific rows and columns.
Pandas is a powerful data analysis and manipulation tool that makes it easy to work with structured data. This project provides an overview of how to import different data formats and perform basic operations to inspect the dataset.
The script starts by checking the installed pandas version:
import pandas as pd
print(pd.__version__)Next, the script reads various file formats:
# Reading a CSV file
df = pd.read_csv('/Users/mohamed/Desktop/Data Science/Ice Cream Ratings.csv')
dfCSV files are commonly used for storing tabular data. The read_csv function allows easy access to structured information.
# Reading a tab-separated text file
df2 = pd.read_csv('/Users/mohamed/Desktop/Data Science/countries of the world.txt', sep='\t')
df2The sep='\t' parameter is used because the text file is tab-separated.
# Reading a JSON file
js = pd.read_json('/Users/mohamed/Desktop/Data Science/json_sample.json')
jsJSON is a common format for storing structured data, often used in APIs.
# Reading an Excel file
dd = pd.read_excel('/Users/mohamed/Desktop/Data Science/world_population_excel_workbook.xlsx', sheet_name='Sheet1')
ddThe read_excel function allows for reading spreadsheets, which can contain multiple sheets.
Once the files are loaded, various commands help in exploring the data:
# Set display options
pd.set_option('display.max_rows', 20)
ddThis limits the number of displayed rows to avoid excessive output.
dd.info()
dd.shape # Returns the number of rows and columns
dd.size # Total number of elements in the DataFrame
dd.value_counts() # Count unique occurrences of values
dd.dtypes # Check data types of each columnThese functions provide an overview of the dataset, including its structure and data types.
dd.head(20) # Display the first 20 rows
dd.tail(10) # Display the last 10 rowsThese functions help quickly inspect the beginning and end of the dataset.
This project demonstrates how to load and explore datasets using pandas. By importing different file formats and utilizing various functions, users can efficiently analyze and manipulate data. Pandas simplifies working with structured data, making it an essential tool for data science and analytics.
Understanding these basic operations provides a solid foundation for more advanced data analysis tasks such as filtering, grouping, and visualization.