# Data Exploration and Visualization

This notebook demonstrates various data exploration and visualization techniques using the provided datasets and seaborn's built-in datasets.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler, OneHotEncoder
from sklearn.feature_selection import SelectKBest, f_regression
sns.set(style="whitegrid")

## Loading Data

In [2]:
# Load student list data
student_df = pd.read_csv('../csv-files/data_studentlist.csv')
student_df.head()

Unnamed: 0,name,gender,age,grade,absence,bloodtype,height,weight
0,Jared Diamond,M,23,3,Y,O,165.3,68.2
1,Sarah O'Donnel,F,22,2,N,AB,170.1,53.0
2,Brian Martin,M,24,4,N,B,175.0,80.1
3,David Hassel,M,23,3,N,AB,182.1,85.7
4,Clara Rodriquez,F,20,1,Y,A,168.0,49.5


In [3]:
# Load car data
car_df = pd.read_csv('../csv-files/data.csv')
car_df.head()

Unnamed: 0.1,Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name,ground_clearance_mm,height
0,0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu,198.991252,2354.830006
1,1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320,154.594484,2273.383573
2,2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite,185.36064,2607.497868
3,3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst,159.031913,2774.271298
4,4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino,169.089797,2936.955633


In [4]:
# Load car data from JSON
import json

json_data = '''[
    {
        "cylinders": 4,
        "displacement": 91.0,
        "hp": 53.0,
        "weight": 1795,
        "acceleration": 17.4,
        "model_year": 76,
        "country": "japan",
        "Name": "honda civic",
        "kmpl": 14.029752,
        "ground_clearance": 6.9923222018
    }
]'''
car_json_df = pd.read_json(json_data)
car_json_df.head()

Unnamed: 0,cylinders,displacement,hp,weight,acceleration,model_year,country,Name,kmpl,ground_clearance
0,4,91,53,1795,17.4,76,japan,honda civic,14.029752,6.992322


## Merging Dataframes
We will demonstrate merging dataframes using inner, left, right, and outer joins.

In [5]:
# Inner join
merged_inner = pd.merge(car_df, car_json_df, left_on='name', right_on='Name', how='inner')
merged_inner.head()

Unnamed: 0.1,Unnamed: 0,mpg,cylinders_x,displacement_x,horsepower,weight_x,acceleration_x,model_year_x,origin,name,...,cylinders_y,displacement_y,hp,weight_y,acceleration_y,model_year_y,country,Name,kmpl,ground_clearance
0,149,24.0,4,120.0,97.0,2489,15.0,74,japan,honda civic,...,4,91,53,1795,17.4,76,japan,honda civic,14.029752,6.992322
1,198,33.0,4,91.0,53.0,1795,17.4,76,japan,honda civic,...,4,91,53,1795,17.4,76,japan,honda civic,14.029752,6.992322


The inner join resulted in an empty dataframe, meaning there are no common names between `data.csv` and the JSON data. Let's try another join type.

In [6]:
# Outer join
merged_outer = pd.merge(car_df, car_json_df, left_on='name', right_on='Name', how='outer')
merged_outer.head()

Unnamed: 0.1,Unnamed: 0,mpg,cylinders_x,displacement_x,horsepower,weight_x,acceleration_x,model_year_x,origin,name,...,cylinders_y,displacement_y,hp,weight_y,acceleration_y,model_year_y,country,Name,kmpl,ground_clearance
0,0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu,...,,,,,,,,,,
1,35,17.0,6,250.0,100.0,3329,15.5,71,usa,chevrolet chevelle malibu,...,,,,,,,,,,
2,1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320,...,,,,,,,,,,
3,2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite,...,,,,,,,,,,
4,3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst,...,,,,,,,,,,


## Concatenating Dataframes
Next, we will concatenate the original car dataframe with the JSON dataframe.

In [7]:
concatenated_df = pd.concat([car_df, car_json_df], ignore_index=True)
concatenated_df.head()

Unnamed: 0.1,Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name,ground_clearance_mm,height,hp,country,Name,kmpl,ground_clearance
0,0.0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu,198.991252,2354.830006,,,,,
1,1.0,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320,154.594484,2273.383573,,,,,
2,2.0,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite,185.36064,2607.497868,,,,,
3,3.0,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst,159.031913,2774.271298,,,,,
4,4.0,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino,169.089797,2936.955633,,,,,


The concatenated dataframe combines all rows from both dataframes. We now have a combined dataset that can be used for further analysis.

## Summary
In this notebook, we:
- Read data from a CSV file
- Read data from a JSON object
- Performed merging (inner and outer join) on the dataframes
- Concatenated the dataframes

These operations are fundamental when working with multiple datasets in data analysis and machine learning projects.