# Introduction to Python for Data Science: The Basics
 
Created By: The TTLAB Team

<div>
<img src="Data/Images/TTLAB.png" width="100"/>
</div>

## What is Python? 
* Not a snake (in this case)
* Very popular programming language that is relatively easy to read
* Comes with a generous standard library that supports commonplace programming tasks
* Runs an all major platforms: Mac OSX, Windows, Linux, Unix
* It's free!
* Created by Guido Van Rossum in 1989

## Why use Python in Data Science?

* Shorter learning curve/easy to understand syntax
* Largest collection of popular data science libraries
    * pandas, NumPy, rapids, scikit-learn ... & so much more
* Thriving open-source community
    * You may find libraries on Github that addresses your problems!
* Graphics and Vizualiation possibilities
    * Packages like matplotlib, bokeh, plotly greatly help the visualization workflow
* Deploying models are easy

## Other languages for Data Science

<div>
<img src="Data/Images/UsedProgrammingLanguages.png" width="600"/>
</div>

*Data from the 2018 Kaggle ML & DS Survey*

<div>
<img src="Data/Images/beforeAfter.jpg" width="500"/>
</div>

## Getting Started

* [Python 3](https://www.python.org/downloads/)
* Integrated Development Environment
    * [PyCharm](https://www.jetbrains.com/pycharm/), [VS Code](https://code.visualstudio.com/), [Atom](https://atom.io/) ... 
* Collection of Data Science libraries
    * pandas, numpy, matplotlib ...

## *Easier* Getting Started

* Utilize the [Anaconda Distribution](https://www.anaconda.com/distribution/)
    * Anaconda is an easy to set up Data Science Distribution with a library, dependency and environment manager.
    * Utilizies Jupyter Notebook as a test pad for writing and running Python scripts.

**OR**
* Use Google Colab [https://colab.research.google.com/](https://colab.research.google.com/)
    * Free Jupyter Notebook env. that runs on the cloud.

___
## Basic Commands

### Printing out information
The infamous "Hello World" program.

In [None]:
print("Hello World")

### Comments

Comments are snippets of text that are placed within the code and typically contain short descriptions about the code. 

* Comments in Python are defined by the '#' character

In [None]:
# Author: Darren R.
print("Hello World")

### Variables

Variables are used as a storage address with a symbolic name which are referred to within the program. 
 * In Python variable types are dynamically interpreted. 

Most used data types: 
* Integers
* Floating Point Numbers
* Strings 
* Boolean

In [None]:
# Integer
age = 24

# Floating Point Number
height = 177.8

# String
name = "Darren Ramsook"

# Boolean
license = True

In [None]:
print("My name is "+ str(name) + " and I am "+ str(age) + " years old.")

### Control Statements

Control statements are a series of intructions that a program follows that allows for case handling and looping.

* if ... else
* if ... elif ... else
* while ...
* for ... in ...

In [None]:
# if ... else

if license == True:
    print("You can drive")
else:
    print("You cannot drive")

In [None]:
# if ... elif ... else

favColor = "Violet"

if favColor == "Green":
    print("Plants")
elif favColor == "Blue":
    print("Sky")
elif favColor == "Red":
    print("Rose")
else:
    print("Pick a new color")

In [None]:
# while ...
count = 0 
while count < 10:
    print(count)
    count = count + 1

In [None]:
# for ... in ...
for i in range(0,10):
    print(i)

### Common Data Structures

* List: Collection which is ordered and mutable.
* Tuple: Collection hich is ordered and immutable.
* Dictionary: Collection that is unorderd, changeable and indexed.

In [None]:
# List 

studentsAge = [10,11,12,13]

# List Indexing
print(studentsAge[0])
print(studentsAge[-1])
print(studentsAge[1:3])

studentsAge[0] = 25
print(studentsAge[0])

In [None]:
# Tuple

studentInfo = ("John Smith", 23, True)

print(studentInfo[0])

In [None]:
# Dictionary

rating = {}
rating["Python"] = 83
rating["SQL"] = 44
rating["R"] = 36
print(rating)

## Extending Functionality: Using Popular Data Science Packages

* NumPy : package that excels at scientific computing
* pandas : easy to use data structure and data analysis tool
* Matplotlib : plotting library built on NumPy arrays

Once packages are installed (Installed by default if using Anaconda Distribution), you can include them into your workflow through using the *import* command.

In [1]:
import numpy as np
import pandas as pd 
import matplotlib as mpl