# 1. Introduction to Python programming

Module M-227-04: Programming for Data Analytics

Instructor: prof. Dmitry Pavlyuk

## About the Course

* Programming for Data Analytics
* __6 credit points__ (9 ECTS): about 240 hours of studies (72 contact hours)
* The course is designed for major students with little or no programming experience and emphasizes writing programmes that can retrieve and manipulate a large amounts of data.
* Primary language - __<span style="color:#9D2235">Python</span>__
* Learning-by-doing approach – weekly practices
* Assessment:
    * Independent project (50%)
    * Final practical exam on data processing (50%)

## Main topics

* Core Python
* Multidimensional computing (_NumPy_ library)
* Data wrangling (_pandas_ library)
* Data visualisation (_matplotlib_ library)
* Modelling (_statsmodels_, _scikit-learn_)

## Materials

1. McKinney, W. (2022), Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter: https://wesmckinney.com/book/ 
2. Downey, A. (2016) Think Python: How to Think Like a Computer Scientist https://greenteapress.com/wp/think-python-2e/ 
3. Course demo notebooks https://github.com/DmitryPavlyuk/python-da

## What is Programming?

Computer programming is the process of designing and building an executable computer program (software).

Classes of software:
* Word/spreadsheet processing software
* Computer games
* Web applications
* Mobile applications
* __<span style="color:#9D2235">Data processing software</span>__

# Python programming language

## Python: short history

* Released in 1991, Python 2.0 in 2000 (not supported now), Python 3.0 in 2008 – pretty old!
* TIOBE Popularity Index – Python is very popular now

![LanguagesTable.fw.png](attachment:LanguagesTable.fw.png)

Note the dynamics of Python popularity!

![LanguagesPlot.fw.png](attachment:LanguagesPlot.fw.png)

## Python and R

Python and R are the preferred languages in Data Science, Data Analysis, Machine Learning. 

![Python-Vs-R.jpg](attachment:Python-Vs-R.jpg)

* Python is a general-purpose language (while R mainly is a statistical language), thus Python can be easier integrated in production environments
* Python is better suitable for machine learning, deep learning, and large-scale web applications.
* R is suitable for statistical learning having powerful libraries for data experiment and exploration.

## Types of programming languages

* __Low-level__: direct machine (computer) instructions
* __High-level__: computer-independent


#### Python is a high-level language

* __Declarative__: describe __what__ the programme does
* __Imperative__: describe __how__ the programme should work

#### Python is an imperative language

* __Domain-specific__ (for particular application domain)
* __General-purpose__

#### Python is a general-purpose language

* __Compiled__: transform the commands to a lower-level programme
* __Interpreted__: translate every command and immediately execute it

#### Python is an interpreted language

## Programming environment

A software programme is a set of instructions that allow for a certain type of computer operation. 

How to organise this set of instructions in Python:
* using __Command-line interface__ (CLI) – send instructions as lines of text

![pythonCLI.fw-2.png](attachment:pythonCLI.fw-2.png)

How to organise this set of instructions in Python:
* using __Scripts__ – compose a file with instructions

![pythonScript.fw.png](attachment:pythonScript.fw.png)

How to organise this set of instructions in Python:
* using __Notebooks__ – interactively merge instructions, instructions’ results and text markdown

## Local environment

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment.

* Install Anaconda with the latest Python (currently 3.9) - https://www.anaconda.com/ 
    * Use Anaconda prompt from the Start menu to run “python” – the command line interface
    * Use “Jupyter Notebook” to start an interactive notebook editor
    * Use “Spyder” to start an environment for Python script editing
    
In this course we will mostly use Jupiter notebooks

## Cloud environment

* Register at Google Colab https://colab.research.google.com/ – a cloud environment for developing and running Python notebooks in  your browser
* Optional: register at Github https://github.com/ for software cloud storage and version control

All my materials are available in Github https://github.com/DmitryPavlyuk/python-da (request access)

# First Python code

Calculate core characteristics of a ball with given radius

In [1]:
radius = 5

#### Diameter:

$d=2r$

In [2]:
d = 2 * radius
print(d)

10


## Value of $\pi$

Importing ___math___ module

In [3]:
import math

In [4]:
print(math.pi)

3.141592653589793


### Calculating ball characteristics

#### Volume:

$V=\frac{4}{3}\pi r^3$

In [5]:
volume = 4/3*math.pi*radius**3
print(volume)

523.5987755982989


#### Surface area
$S = 4 \pi r^2$

In [6]:
area = 4*math.pi*radius**2
print(area)

314.1592653589793


# Thank you