# Importing Data

## Working With Data in Python
### Importing Data

In [None]:
import statsmodels.api as sm
mtcars = sm.datasets.get_rdataset("mtcars").data
print(sm.datasets.get_rdataset("mtcars").__doc__)

### Querying Data
Retrieve the first 5 rows of the mpg column in the mtcars data frame.

In [None]:
mtcars['mpg'][0:5]

Examine the dimensions of the mtcars data frame, using the shape function.

In [None]:
mtcars.shape

Retrieve some basic information about the mtcars data frame using the info() function.

In [None]:
mtcars.info()

Get unique values in gear factor in data

In [None]:
mtcars.gear.unique() 

In [None]:
Examine the columns of the mtcars data frame.

In [None]:
mtcars.columns

Preview the first few rows of the mtcars data frame.

In [None]:
mtcars.head()

In [None]:
Preview the last few rows of the mtcars data frame.

In [None]:
mtcars.tail()

View the first two rows of the mtcars data frame.

In [None]:
mtcars[0:2]

View the cyl column of the mtcars data frame.

In [None]:
mtcars[:2]

In [None]:
mtcars[3:5]

In [None]:
mtcars['cyl']


View the gear column of the mtcars data frame as an attribute of the data frame.

In [None]:
mtcars.gear

Retrieve rows from the mtcars data frame that have an mpg greater than 20.

In [None]:
mtcars[mtcars.mpg > 20]
 

### Use of iloc 
iloc function is provided by pandas. Allows to select the rows and columns using structure similiar to array slicing. 
*To get only first 5 rows and first 4 columns.*

In [None]:
mtcars.iloc[:5,:4]

In [None]:
mtcars.iloc[:2]

In [None]:
mtcars.iloc[:4,2:4]

## Calculating Basic Statistics of the Data
##### Mean
Calculate the mean of the car's mpg.

In [None]:
mtcars.mpg.mean()

##### Median
Calculate the median of the car's mpg.

In [None]:

mtcars.mpg.median() 

##### Mode
Calculate the mode of the car's mpg.

In [None]:
mtcars.mpg.mode() 

##### Quantile
Calculate the observation that cuts off the first 25 percent of the data values when it is sorted in ascending order.

In [None]:
mtcars['mpg'].quantile()

##### Standard Deviation
Calculate the standard deviation  of the mpg.

In [None]:
mtcars['mpg'].std()

###### Variance
Calculate the standard  variance of the mpg.

In [None]:
mtcars['mpg'].var()

##### Summary Statistics of the Dataset
Calculate distribution summary of numeric data of the mpg.

In [None]:
mtcars.describe() 

# Features and Records
The columns in a data frame are known as its features. The rows are known as records or observations. 

# Standard scalar                                                        

The mean values for the respective features in our initial dataset or table are 29.3, 92, and 38. To make all the data have a similar mean, that is, a zero mean and a unit variance across the data, we shall apply the standard scalar algorithm:

In [None]:
from sklearn import preprocessing
stand_scalar =  preprocessing.StandardScaler().fit(mtcars)     
results = stand_scalar.transform(mtcars)     
print(results)

In [None]:
mtcars.iloc[:,1].mean()

In [None]:
results[1].mean()

In [None]:
results.shape

# Min-max scalar                                                        
The min-max scalar form of normalization uses the mean and standard deviation to box all the data into a range lying between a certain min and max value. For most purposes, the range is set between 0 and 1. At other times, other ranges may be applied but the 0 to 1 range remains the default:

In [None]:
scaled_values = preprocessing.MinMaxScaler(feature_range=(0,1))     
results = scaled_values.fit(data).transform(data)     
print(results)