# Python for machine learning

## Fundamentals of machine learning

- ML is everywhere in the modern world
- Machines can do repetitive tasks quickly
- We traditionally give machines input + instructions -> output
- 1959: Arthur Samuel - can machines infer logic instead of being given instructions?
- Could we give them just input data, and output from previously finished tasks, and let them figure out the best instructions to give the same output based on the data provided? E.g., is a linear model most appropriate?
- Once a model is trained, we could just give it input data
- This is supervised learning -> applications include spam filtering, image analysis, and text prediction
- Unsupervised learning -> we just give it input data, and ask it to find patterns -> e.g., movie recommendations on Netflix
- Reinforcement learning -> positive or negative feedback, to learn the optimal path to take given the environment

### Terminology

- AI is a catch-all term -> all ML is AI, not all AI is ML
- ML encompasses unsupervised, supervised, and reinforcement learning paradigms (the three core categories of ML)
- A further term within machine learning is deep learning -> all deep learning is ML, not all DL is ML

#### Unsupervised learning
- the process of building descriptive models
- identify patterns in unlabelled data
- used to summarise and group data in new ways
- useful to uncover patterns that might be informative for business purposes
- e.g., these people have x behaviour, these have y -> offer x this, and y this

#### Supervised learning 
- training a predictive model
- learn patterns from previously labelled data, and then:
- assign label to unlabelled data, based on the historical data
- input = independent variable, output = dependent variable. Together, these make training data.
- Example: previous performance of some gamblers, with preferences, cash amount (independent variables) + profit or loss for the business (dependent variables). Train the model on this.
- Assess performance of the model -> give it only input data, hiding the outcome. What does it predict? 
- This gives the predictive accuracy, e.g. 99%. 
- A model is "learning" if it's performance at a task improves with experience
- From this we can take E - experience, T - class of task, & P - performance measure (predictive accuracy)

#### Reinforcement learning
- learning to make decisions on the basis of interactions
- Objective 1: find unknown solutions to existing problems (e.g. a chess computer)
- Objective 2: Find solutions to unpredicted problems
- Two entities in RL. Agent & environment.
- Agent interacts with env by acting (its objective is to maximise rewards to itself)
- Environment provides feedback to agent (state - describes impact of previous action, and possible next actions + reward - the numeric reward)
- Exploitation = choosing the current action that maximises the reward
- Exploration = choosing other actions that do not necessarily appear to maximise the reward (choosing action without considering reward)
- The exploitation vs exploration reward problem. The simplest agent model will always choose to maximise score, rather than explore other lower scoring avenues that may be longer term higher scoring. A balanced approach is likely more successful. 


#### Deep learning
- deep because the algorithm network has many layers
- broad term in itself, a form of ML based on the human brain/animal nervous system
- feature learning or representation learning, which can be supervised, semi-supervised, or unsupervised
- progressive extraction of higher-level features from raw input
- can use "artificial neural networks", or "neural nets", algorithms with inter-connected nodes
- edges have associated weights, and the network defines rules for data to be passed from an input layer to the output layer


# Steps to the ML process

### 1. Data collection

- unlabelled data (unsupervised)
- labelled historical data (supervised)
- data that helps the agent learn which actions yield most reward (reinforcement)

Considerations: data accuracy (i.e., is it really "ground truth data"), relevance (is it important data for the model aim?), quantity (some models need little, some need a lot), variability (do we capture the full data range?), ethics (informed consent, biases in collection leading to biases in policy, etc.)

In [None]:
# Importing data in pandas

import pandas as pd



### 2. Data exploration

- understand data: describe and visualise it
- cleaning, checking for outliers, etc.

### 3. Data preparation (up to here, = 80% of the time!)

- Resolve problems (missing data, noisy data, outlier, class imbalance)
- structuring it to be easy to use (normalise, reduction etc.)

### 4. Modeling

- apply a machine learning approach to the data

### 5. Evaluation

- assess how well it worked, e.g. supervised, are labels or values correct for unseen data?
- unsupervised: a good model is one that makes sense
- iterate back to the modeling step

### 6. Actionable insight

- identify a course of action based on the model
- do we deploy the model?
- what do we do with the insights from the model?


# Artificial neural networks



# Generative artificial neural networks



- SciKit-Learn
- Pytorch
- TensorFlow
- matplotlib
- Pandas/NumPy/SciPy