# Exploratory Data Analysis

One common activity in any data science and ML/AI workflow is exploratory data analysis (EDA). We want to perform an EDA to inspect the dataset for data integrity and quality. We use the `pandas` library for data manipulation and EDA. The library enables us to parse raw data into data structures like dataframe or table in our Python code for data inspection and manipulation.

In [2]:
# Install the packages

!pip install pandas > /dev/null 2>&1

In [3]:
import pandas as pd

Let's start by importing data from a csv file.

In [4]:
df = pd.read_csv('../../data/fortune500.csv')

## Dataframe Shape/Size Inspection

Here are some ways to inspect the shape or size of the dataframe. 

In [7]:
print(f'#rows: {len(df)}')
print(f'(#rows, #cols): {df.shape}')

print()
print('First 5 rows:')
print(df.head())

First 5 rows:
   Year  Rank           Company  Revenue (in millions) Profit (in millions)
0  1955     1    General Motors                 9823.5                  806
1  1955     2       Exxon Mobil                 5661.4                584.8
2  1955     3        U.S. Steel                 3250.4                195.4
3  1955     4  General Electric                 2959.1                212.6
4  1955     5            Esmark                 2510.8                 19.1

#rows: 25500
(#rows, #cols): (25500, 5)


## Dataframe Type Inspection

Here are some ways to inspect the types of the columns. 


In [8]:
print('The data types of the columns')
print(df.dtypes)

The data types of the columns
Year                       int64
Rank                       int64
Company                   object
Revenue (in millions)    float64
Profit (in millions)      object
dtype: object
