---
# Importing and analyzing the dataset
---

---

## Data Analysis with Python and Pandas

---

### Overview
This notebook demonstrates the process of importing and analyzing a dataset using Python's Pandas library. The workflow covers loading data, inspecting its structure, and performing basic descriptive statistics.

### Data Import
The analysis begins by importing the pandas library, which is a powerful data manipulation tool in Python. The dataset "mtcars2.csv" is loaded into a DataFrame named 'cars', which is a two-dimensional tabular data structure with labeled axes.

### Data Exploration
Several methods are used to explore the dataset:

- **Head and Tail Methods**: These methods display the first and last rows of the DataFrame respectively, providing a quick glimpse of the data structure. The parameter passed to these methods determines the number of rows to display.

- **Shape Attribute**: This returns a tuple representing the dimensions of the DataFrame (rows, columns). This helps understand the size of the dataset being analyzed.

- **Info Method**: This provides a concise summary of the DataFrame, including the data types of columns and non-null values. It's useful for identifying missing data and understanding data types.

### Descriptive Statistics
The notebook explores various statistical methods to understand the dataset:

- **Mean**: Calculates the arithmetic average of each column, providing central tendency information.

- **Standard Deviation**: Measures the amount of variation or dispersion of the data points, indicating how spread out the values are from the mean.

- **Median**: Identifies the central value of the dataset columns.

- **Max and Min**: Shows the maximum and minimum values for each column, helping identify the range of data.

- **Count Method**: Counts the number of non-null values in each column, which helps identify columns with missing data.

- **Describe Method**: Generates a statistical summary of the DataFrame, including count, mean, standard deviation, minimum, quartiles, and maximum values. This provides a comprehensive overview of the numerical data distribution.

### Dataset Content
The dataset contains information about various car models with measurements including miles per gallon (mpg), cylinder count (cyl), displacement (disp), horsepower (hp), and weight (wt), among others. This appears to be the famous "Motor Trend Car Road Tests" dataset, commonly used in data analysis examples. The dataset includes 32 observations with 13 variables, with some missing values in the 'qsec' column.

---

In [None]:
import pandas as pd        # importing pandas

In [None]:
cars = pd.read_csv("mtcars2.csv")      # reading the csv file
cars

Unnamed: 0,S.No,Unnamed: 1,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,1,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,2,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,3,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,4,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,5,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
5,6,Valiant,18.1,6,225.0,105,2.76,3.46,,1,0,3,1
6,7,Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
7,8,Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
8,9,Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
9,10,Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [None]:
type(cars)        # checking the type of the object

pandas.core.frame.DataFrame

In [None]:
cars.head()      # checking the first 5 rows of the dataframe

Unnamed: 0,S.No,Unnamed: 1,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,1,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,2,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,3,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,4,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,5,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


In [None]:
cars.head(3)   # checking the first 3 rows of the dataframe

Unnamed: 0,S.No,Unnamed: 1,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,1,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,2,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,3,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1


In [None]:
cars.tail()   # checking the last 5 rows of the dataframe

Unnamed: 0,S.No,Unnamed: 1,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
27,28,Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
28,29,Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4
29,30,Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6
30,31,Maserati Bora,15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8
31,32,Volvo 142E,21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2


In [None]:
cars.tail(10) # checking the last 10 rows of the dataframe

Unnamed: 0,S.No,Unnamed: 1,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
22,23,AMC Javelin,15.2,8,304.0,150,3.15,3.435,17.3,0,0,3,2
23,24,Camaro Z28,13.3,8,350.0,245,3.73,3.84,15.41,0,0,3,4
24,25,Pontiac Firebird,19.2,8,400.0,175,3.08,3.845,17.05,0,0,3,2
25,26,Fiat X1-9,27.3,4,79.0,66,4.08,1.935,,1,1,4,1
26,27,Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
27,28,Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
28,29,Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4
29,30,Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6
30,31,Maserati Bora,15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8
31,32,Volvo 142E,21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2


In [None]:
cars.shape  # checking the shape of the dataframe

(32, 13)

In [None]:
cars.info() # checking the info of the dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   S.No        32 non-null     int64  
 1   Unnamed: 1  32 non-null     object 
 2   mpg         32 non-null     float64
 3   cyl         32 non-null     int64  
 4   disp        32 non-null     float64
 5   hp          32 non-null     int64  
 6   drat        32 non-null     float64
 7   wt          32 non-null     float64
 8   qsec        29 non-null     float64
 9   vs          32 non-null     int64  
 10  am          32 non-null     int64  
 11  gear        32 non-null     int64  
 12  carb        32 non-null     int64  
dtypes: float64(5), int64(7), object(1)
memory usage: 3.4+ KB


In [None]:
cars.median # checking the median of the dataframe

<bound method DataFrame.median of     S.No           Unnamed: 1   mpg  cyl   disp   hp  drat     wt   qsec  vs  \
0      1            Mazda RX4  21.0    6  160.0  110  3.90  2.620  16.46   0   
1      2        Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875  17.02   0   
2      3           Datsun 710  22.8    4  108.0   93  3.85  2.320  18.61   1   
3      4       Hornet 4 Drive  21.4    6  258.0  110  3.08  3.215  19.44   1   
4      5    Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   
5      6              Valiant  18.1    6  225.0  105  2.76  3.460    NaN   1   
6      7           Duster 360  14.3    8  360.0  245  3.21  3.570  15.84   0   
7      8            Merc 240D  24.4    4  146.7   62  3.69  3.190  20.00   1   
8      9             Merc 230  22.8    4  140.8   95  3.92  3.150  22.90   1   
9     10             Merc 280  19.2    6  167.6  123  3.92  3.440  18.30   1   
10    11            Merc 280C  17.8    6  167.6  123  3.92  3.440  18.90   1   
11    

In [None]:
cars.max()   # checking the maximum of the dataframe

S.No                  32
Unnamed: 1    Volvo 142E
mpg                 33.9
cyl                    8
disp               472.0
hp                   335
drat                4.93
wt                 5.424
qsec                22.9
vs                     1
am                     1
gear                   5
carb                   8
dtype: object

In [None]:
cars.min() # checking the minimum of the dataframe

S.No                    1
Unnamed: 1    AMC Javelin
mpg                  10.4
cyl                     4
disp                 71.1
hp                     52
drat                 2.76
wt                  1.513
qsec                 14.5
vs                      0
am                      0
gear                    3
carb                    1
dtype: object

In [None]:
cars.count() # checking the count of the dataframe

S.No          32
Unnamed: 1    32
mpg           32
cyl           32
disp          32
hp            32
drat          32
wt            32
qsec          29
vs            32
am            32
gear          32
carb          32
dtype: int64

In [None]:
cars.describe() # checking the description of the dataframe

Unnamed: 0,S.No,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
count,32.0,32.0,32.0,32.0,32.0,32.0,32.0,29.0,32.0,32.0,32.0,32.0
mean,16.5,20.090625,6.1875,230.721875,146.6875,3.596563,3.21725,17.674828,0.4375,0.40625,3.6875,2.8125
std,9.380832,6.026948,1.785922,123.938694,68.562868,0.534679,0.978457,1.780394,0.504016,0.498991,0.737804,1.6152
min,1.0,10.4,4.0,71.1,52.0,2.76,1.513,14.5,0.0,0.0,3.0,1.0
25%,8.75,15.425,4.0,120.825,96.5,3.08,2.58125,16.87,0.0,0.0,3.0,2.0
50%,16.5,19.2,6.0,196.3,123.0,3.695,3.325,17.42,0.0,0.0,4.0,2.0
75%,24.25,22.8,8.0,326.0,180.0,3.92,3.61,18.6,1.0,1.0,4.0,4.0
max,32.0,33.9,8.0,472.0,335.0,4.93,5.424,22.9,1.0,1.0,5.0,8.0


---