#Week-1 (Lab) Basics of Statistical Analysis

Python-based computing environments are one of the most popular options among data scientists. Python provides standard, in-built functionalities that can be extended importing third-party packages. One of the most common third-party libraries include **NumPy.**, whic we will use for this week's lab task. We will also use **Scipy** library for some of the calculations.

The labs in this module will use Python-powered Jupyter notebooks to create interactive data science computing environments. A Jupyter notebook consists of code cells that contain code that can be executed, and text cells that are used for documetation purposes. Jupyter notebooks are powered by an engine, in our case Python, that can be on your machine (local runtime) or hosted on a remote, cloud-based machine (hosted runtime). The instructions in each lab will assume that you are using Google Colab, but you should be able to use other runtimes too. If you choose other runtimes you might need to do some minor tweaks, such as installing a library that is provided by the Google Colab environment by default.




#Introductory Instructions about handling **Text block** and **Code block** in Python:

Click on this text, and cell that contains it will be highlighted. Double-click on it, and you will be able to edit it.

Text cells, also known as markdown cells, allow you to document your work. You can use different fonts, add titles, include resources such as images and even write nice mathematical expression, for instance

$\begin{equation}
f(x) = w_0 + w_1 x
\end{equation}$

To write such nice mathematical expressions, you need to use a different markup language called Latex.

Code cells, such as the one below this text cell, contain code that can be executed. In the example below, we first create the variable mystring by assigning the string value "Welcome to your first Data Science Session!" and then we print it. You can run it by clicking on the play button on the left of the block. Any output will appear underneath the block.

In [None]:
mystring = "Welcome to your first Data Science Session!"
print(mystring)

Welcome to your first Data Science Session!


Notice that if you are using a Colab, it might take a while to run your first cell. The reason is that Google needs to allocate some computing resources first, so nothing to worry about.

Here is another example, where we create an integer variable myint:

In [None]:
myint = 25

You will have noticed that running the cell didn't produce any output. The variable has however been created, as you can see if you use the command whos:

In [None]:
whos

Variable             Type          Data/Info
--------------------------------------------
CType_Var            ndarray       8: 8 elems, type `int64`, 64 bytes
HP_Var               ndarray       8: 8 elems, type `int64`, 64 bytes
Mean_of_HP           float64       103.625
Mean_of_MpG          float64       22.2625
Median_of_HP         float64       110.0
Median_of_MpG        float64       22.9
Mode_of_CType        ModeResult    ModeResult(mode=array([6]), count=array([5]))
Mode_of_HP           ModeResult    ModeResult(mode=array([110]), count=array([5]))
Mode_of_MpG          ModeResult    ModeResult(mode=array([23.]), count=array([4]))
MpG_Var              ndarray       8: 8 elems, type `float64`, 64 bytes
S_Deviation_of_HP    float64       0.9068317098558034
S_Deviation_of_MpG   float64       0.9068317098558034
Variance_of_HP       float64       69.984375
Variance_of_MpG      float64       0.8223437499999999
myint                int           25
mystring             str           Welc

Using print, you can check the value of myint:

In [None]:
print("The value I want to print is:", myint)

The value I want to print is: 25


# Importing Library: NumPy

NumPy is the fundamental Python library for numerical computing. 

In [None]:
import numpy as np

#How to Upload Files, e.g. Dataset(s)
In this module we will emphasize on how to extract knowledge from data. Hence, at some point we will need to make our data available to the machine running our code. If you are running your Jupyter notebook on Google Colab, then we'll need to make sure Colab's runtime has access to our data.

There are different ways to make our data accessible, but here we will present the easiest one:

- Find the `Files` icon on the left pane and click on it.
- Click on the `Upload` icon.
- Select the file that you want to upload from your local machine.

Note that you will not be able to upload a file if your notebook has not been allocated any Colab computing resources. Also note that your notebook will be disconnected if left idle for too long and after 12 hours. Disconnection will result in your data being lost.

We will upload the CSV file `HP.csv`. Once uploaded, you should be able to see it on the left-hand pane.

In [None]:
HP_Var = np.loadtxt("./HP.csv", dtype=int)
print (HP_Var)

[110 110  93  96  90 110 110 110]


**Calculating the "Mean" of HP:**

In [None]:
Mean_of_HP= np.mean(HP_Var)
print (Mean_of_HP)

103.625


#Question:1
Upload MpG.csv (Milage per Gallon of Cars) and calculate the "Mean" of Milage per Gallon.

In [None]:
#feel free to copy and paste the appropriate code segments from the previous sections here.
MpG_Var = np.loadtxt("./MpG.csv", dtype=float)
print (MpG_Var)
Mean_of_MpG= np.mean(MpG_Var)
print (Mean_of_MpG)

[21.  21.  22.8 21.3 23.  23.  23.  23. ]
22.2625


**Calculating Median of "HP":**

In [None]:
Median_of_HP= np.median(HP_Var)
print(Median_of_HP)
np.sort(HP_Var)

110.0


array([ 90,  93,  96, 110, 110, 110, 110, 110])

#Question:2
Calculate the "Median" of Milage per Gallon.

In [None]:
#feel free to copy and paste the appropriate code segments from the previous sections here.
Median_of_MpG= np.median(MpG_Var)
print(Median_of_MpG)
np.sort(MpG_Var)

22.9


array([21. , 21. , 21.3, 22.8, 23. , 23. , 23. , 23. ])

# Importing Library: SciPy

SciPy stands for Scientific python. For some statistical calculation, such as Mode, we do need Scipy as Numpy has no direct function to facilitate the calculation of Mode. 

There are several classes available in scipy library. One of them is "stats" which is used to calculate mode.

Now, the following syntax is about calculating "Mode". We will see an example syntax where, we will calculate "Mode" of HP.

In [None]:
from scipy import stats

Mode_of_HP=stats.mode(HP_Var)
print(Mode_of_HP)

ModeResult(mode=array([110]), count=array([5]))


#Question:3
Upload the file CType.csv and calculate the "Mode" of cylindar type of Cars.

In [None]:
#feel free to copy and paste the appropriate code segments from the previous sections here.
CType_Var = np.loadtxt("./CType.csv", dtype=int)
Mode_of_CType=stats.mode(CType_Var)
print(Mode_of_CType)




Mode_of_MpG=stats.mode(MpG_Var)
print(Mode_of_MpG)

ModeResult(mode=array([6]), count=array([5]))
ModeResult(mode=array([23.]), count=array([4]))


Following code segment is for calculating "Variance". We will calculate Variance of HP of cars.

**Calculating Variance of HP:**

In [None]:
Variance_of_HP= np.var(HP_Var)
print(Variance_of_HP)

69.984375


#Question:4
Calculate the variance of Milage per Gallon.

In [None]:
#feel free to copy and paste the appropriate code segments from the previous sections here.
Variance_of_MpG= np.var(MpG_Var)
print(Variance_of_MpG)

0.8223437499999999


Now, the following code segment is given as an example of calculation of "Standard Deviation". We will calculate standard deviation of "HP" of Cars.

**Calculating the Standard Deviation of HP:**

In [None]:
S_Deviation_of_HP= np.std(HP_Var)
print(S_Deviation_of_HP)

8.365666440876064


#Question:5
Calculate the Standard Deviation of MpG (Milage per Gallon) of Cars.

In [None]:
#feel free to copy and paste the appropriate code segments from the previous sections here.
S_Deviation_of_MpG= np.std(MpG_Var)
print(S_Deviation_of_MpG)

0.9068317098558034


**Calculating the Range of a quantity:**
As you know "Range" of an attribute is calculated from the difference of the Maximum value of that attribute and the minimum value of that attribute. Therefore, it is essential to find the Maximum and Minimum value of any attribute in a dataset to calculate its Range. 


#Question:6
Calculate the Range of Milage per Gallon from MpG.csv Dataset. Since there is no built in function that can be commonly used to calculating Range from the Dataset. You must first define a custom function (e.g. called "Range(x)" for completing this task.

In [None]:
rangempg = np.amax(MpG_Var) - np.amin(MpG_Var)
print(rangempg)


rangehp = np.max(HP_Var) - np.min(HP_Var)
print(rangehp)




2.0
20
