In [None]:
# Author of this notebook: Sarzil Hossain
# This is "Part 1" of a series of tutorial notebooks

# Intial Step 1: Importing Necessary Libraries

In [None]:
import pandas as pd

# This will import the Pandas library.
# Since "pandas" is a big word, we will use the acronym pd


from urllib.request import urlretrieve

# urllib is the name of a library
# request is the name of a class inside that library
# urlretrieve is the name of a function inside that class

# from urllib.request import urlretrieve means, from the request class inside the urllib module,
#     get me the urlretrieve function
    
# With the urlretrieve function, we will download the dataset

In [None]:
"""
ONLY IF any error occurs, on the previos line, run this cell.
If pandas is not installed, you need to install it.
urllib should be installed by default
"""

!pip install pandas

# Initial Step 2: Downloading the Dataset

In [None]:
# Let's download the file first.

dataset_url = "https://raw.githubusercontent.com/prmethus/IBM-FuelConsumption_Pipeline_FastAPI/main/Data/FuelConsumption.csv"
save_as = "FuelConsumption.csv"

# urlretrieve(url, save_to) is a function you can use to download files
urlretrieve(dataset_url, save_as)

In [None]:
# Boom, Download complete! Python is magic.

# Initial Step 3: Getting the Data

In [None]:
# Let's create a Pandas Dataframe from the CSV dataset
# or in simple words, "Let's take the CSV file and use it"

df = pd.read_csv("FuelConsumption.csv")

# The dataframe is saved as 'df', short for "dataframe"

# Q: What does the data look like?

In [None]:
# Let's PRINT THE WHOLE DATAFRAME!!!

df

**Uh, oh, that's a lot of data. And I don't understand anything :/**

Let's make things a bit more *clear* by just looking at the first 6 rows

In [None]:
# Let's see the first 6 rows

df.head(6)

# 'head' is a dataframe method that shows the first 6 rows.

In [None]:
# Now let's see the last 6 rows

df.tail(6)

# 'tail' is a dataframe method that shows the last 6 rows

WAIT A MINUTE. DATAFRAMES GOT HEADS AND TAILS?  
That's an intuitive naming convention that is easy to remember.

Let's randomly pick 6 values and see them

With the 'sample' method  

Why 6? Well not 6, just put how many 'data points' you want to be printed. It can be 5, 6, 10, 20, whatever

In [None]:
df.sample(6)

## Q: How many rows and columns does it have?

In [None]:
df.shape # :/ yeah, thats all

So,
1067 rows and 13 columns

## Q: What are the columns?

In [None]:
df.columns

So the columns are:
    1. MODELYEAR - The Year the car was manufactured  
    2. MAKE - The Manufacturer  
    3. MODEL - The Car's Model  
    4. VEHICLECLASS - If the car is SUV, Coupe, Sedan, etc.  
    5. ENGINESIZE - The Size of the Engine in CC  
    6. CYLINDERS - Number of Cylinders  
    7. TRANSMISSION - 'A' means auto, 'M' means Manual. The last digit refers to the number of gears  
    8. FUELTYPE - What type of fuel does the car use  
    9. FUELCONSUMPTION_CITY, 
       FUELCONSUMPTION_HWY, 
       FUELCONSUMPTION_COMB, 
       FUELCONSUMPTION_COMB_MPG, 
       - How many gallons of fuel is used by the car per month  
    10. CO2EMISSIONS - Carbon Dioxide Emission every month in Kilograms

## Q: What are the data types of the columns? (Numerical / Categorical)
int64 - 64 bit integer (can be 8, 16 or 32 bit as well)  
float64 - 64 bit floating point number (can be 8, 16 or 32 bit as well)  
object - most commonly, strings

In [None]:
df.dtypes

# Basic Statistics - Summarizing

The 'describe' method returns a DataFrame containing summary of the "Numeral" values.

In [None]:
df.describe() 

# Saving Files
## Lets save the stats in a file called "Summary.csv"

The copy() method makes a copy of a dataframe.

The to_csv("filename.csv") method saves the dataframe as a CSV file.

In [None]:
# We will at first make a "copy" of the summary dataframe.

df_summary = df.describe().copy()
df_summary.to_csv("FuelConsumption_Summary.csv")

# Congrats

You now know how to:
1. Import libraries
2. Download a file from the internet and save it  
3. Take a look at the data
4. Do Basic Statistics
5. Save a Dataframe as a CSV file

Keep practising!