Perform the following operations using R/Python on suitable data sets: 
a) read data from different formats (like csv, xls) 
b) indexing and selecting data, sort data,  
c) describe attributes of data, checking data types of each column,  
d) counting unique values of data, format of each column, converting variable data type (e.g. 
from long to short, vice versa),  
e) identifying missing values and fill in the missing values 

In [None]:
# importing libraries
import numpy as np
import pandas as pd

**a) Read data from different formats (like csv, xls)**

In [None]:
# read data as csv or xls file
sample = pd.read_csv("/content/insurance - insurance.csv")
sample

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.900,0,yes,southwest,16884.92400
1,18,male,33.770,1,no,southeast,1725.55230
2,28,male,33.000,3,no,southeast,4449.46200
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,male,30.970,3,no,northwest,10600.54830
1334,18,female,31.920,0,no,northeast,2205.98080
1335,18,female,36.850,0,no,southeast,1629.83350
1336,21,female,25.800,0,no,southwest,2007.94500


**b) Indexing and selecting data, sort data**

In [None]:
# indexing and selecting data through Columns
sample[["age", "sex", "charges"]]

Unnamed: 0,age,sex,charges
0,19,female,16884.92400
1,18,male,1725.55230
2,28,male,4449.46200
3,33,male,21984.47061
4,32,male,3866.85520
...,...,...,...
1333,50,male,10600.54830
1334,18,female,2205.98080
1335,18,female,1629.83350
1336,21,female,2007.94500


In [None]:
# indexing and selecting data through Rows
sample.loc[0:8]

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552
5,31,female,25.74,0,no,southeast,3756.6216
6,46,female,33.44,1,no,southeast,8240.5896
7,37,female,27.74,3,no,northwest,7281.5056
8,37,male,29.83,2,no,northeast,6406.4107


In [None]:
# indexing and selecting the data by Rows and Columns
sample.iloc[5:10, 0:3]

Unnamed: 0,age,sex,bmi
5,31,female,25.74
6,46,female,33.44
7,37,female,27.74
8,37,male,29.83
9,60,female,25.84


In [None]:
# (Rows, Columns) of data
sample.shape

(1338, 7)

In [None]:
# Sort the values inside a table
sample.sort_values(["age", "sex"])

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
31,18,female,26.315,0,no,northeast,2198.18985
46,18,female,38.665,2,no,northeast,3393.35635
50,18,female,35.625,0,no,northeast,2211.13075
102,18,female,30.115,0,no,northeast,21344.84670
161,18,female,36.850,0,yes,southeast,36149.48350
...,...,...,...,...,...,...,...
635,64,male,38.190,0,no,northeast,14410.93210
752,64,male,37.905,0,no,northwest,14210.53595
1051,64,male,26.410,0,no,northeast,14394.55790
1241,64,male,36.960,2,yes,southeast,49577.66240


**c) Describe attributes of data, checking data types of each column**

In [None]:
# Describe attributes of data
sample.describe()

Unnamed: 0,age,bmi,children,charges
count,1338.0,1338.0,1338.0,1338.0
mean,39.207025,30.663397,1.094918,13270.422265
std,14.04996,6.098187,1.205493,12110.011237
min,18.0,15.96,0.0,1121.8739
25%,27.0,26.29625,0.0,4740.28715
50%,39.0,30.4,1.0,9382.033
75%,51.0,34.69375,2.0,16639.912515
max,64.0,53.13,5.0,63770.42801


In [None]:
# Checking data types of each column
sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB


**d) Counting unique values of data, format of each column, converting variable data type (e.g.
from long to short, vice versa)**

In [None]:
# Counting unique values of data
sample.value_counts()

age  sex     bmi     children  smoker  region     charges    
19   male    30.590  0         no      northwest  1639.56310     2
47   male    29.830  3         no      northwest  9620.33070     1
48   female  25.850  3         yes     southeast  24180.93350    1
             22.800  0         no      southwest  8269.04400     1
47   male    47.520  1         no      southeast  8083.91980     1
                                                                ..
31   female  25.740  0         no      southeast  3756.62160     1
             23.600  2         no      southwest  4931.64700     1
             21.755  0         no      northwest  4134.08245     1
30   male    44.220  2         no      southeast  4266.16580     1
64   male    40.480  0         no      southeast  13831.11520    1
Length: 1337, dtype: int64

In [None]:
# Converting variable data type
sample.charges.astype(int)

0       16884
1        1725
2        4449
3       21984
4        3866
        ...  
1333    10600
1334     2205
1335     1629
1336     2007
1337    29141
Name: charges, Length: 1338, dtype: int64

**e) identifying missing values and fill in the missing values**

In [None]:
# Finding the missing values
sample.isnull().sum()

age         0
sex         0
bmi         0
children    0
smoker      0
region      0
charges     0
dtype: int64

In [None]:
# Filling the missing values
sample.fillna(0)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.900,0,yes,southwest,16884.92400
1,18,male,33.770,1,no,southeast,1725.55230
2,28,male,33.000,3,no,southeast,4449.46200
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,male,30.970,3,no,northwest,10600.54830
1334,18,female,31.920,0,no,northeast,2205.98080
1335,18,female,36.850,0,no,southeast,1629.83350
1336,21,female,25.800,0,no,southwest,2007.94500
