# Prologue - Welcome to the course!

Course name: Non-Neural Machine Learning

## Goals of the course

This course aims to give an introduction to **big data**, **statistical** and **classical machine learning**. The primary objective is to provide participants with a **theoretical** overview and equip them with key **intuitions** and basic **technical skills** to start their work in applied modeling. 

Due to time limitations, the vast size of this field and the dynamically evolving nature of our scientific understanding and of the available tools, a considerable amount of **further work** is needed by the participants later on to hone their skills and understanding. The course material contains frequent pointers to further readings and information sources that can guide the students on their later journey.

Note: big data is covered in the present Non-neural ML course, while the next DL course will be dedicated entirely to artificial neural networks.

## Logistics

The different topics covered involve **Theory**, **Practical implementation** and **"Task"** session, alternately. The Theory and Practical implementation parts serve to lay the theoretical foundations for understanding the topic, and to impart the practical knowledge for the implementation of the relevant solutions. The Task part of sessions centers on tasks relevant for the theoretical material and the course participants themselves must implement the solutions in Python (and its multitude of libraries) based on the Practical implementations presented and discussed.

The completion of the in-class tasks can be either individual or cooperative. **Interaction with the instructors about the problems is always** strongly **encouraged** (in both the practical, as well as the theoretical parts of the sessions).

###  Course requirements and assessment

Assessment is based on course work in the form of a home assignment.

#### Home assignment

Course participants will need to submit a home assignment in the form of an **individual mini-project** the end of the course. The mini-project will be based on the regular lab-like practice assignments in class, but will necessitate the incorporation of multiple solutions and considerations relating to data pipelines. The assignment will have to be **prepared and submitted in a jupyter notebook**, as well as a **PDF version** of the notebook in addition.


##### The random generator for the A and B groups:

In [None]:
import numpy as np
import pandas as pd

In [None]:
# A group equal 0
# B group equal 1

participants = ["Ben Ayada Ghassen",
                "Bolhassani Alireza",
                "Ashley Nicole Honeycutt",
                "Khezri Asa",
                "Gokul Kiliyara Murikkinchery",
                "Latonio Elaine Marie",
                "Chiemeka Adeboye Madufor",
                "Makvandi Davood",
                "Schin Lotar Csaba",
                "Seyed Iman Seyedi Tabari",
                "Horvath Laszlo"]

df = pd.DataFrame(participants, columns=["Names"])

In [None]:
# Random generator
A = 0
B = 0

groups = []

for i in range(0,len(participants)):
  result = np.random.randint(2, size=1)[0]
  groups.append(result)

  if result == 0:
    A +=1

  if result == 1:
    B += 1

  if A == len(participants)/2:
    for j in range(0, (len(participants) - i) - 1):
      groups.append(1)
    break
  if B == len(participants)/2:
    for j in range(0, (len(participants) - i) - 1):
      groups.append(0)
    break

In [None]:
df["Groups"] = groups
df.replace(0, 'A', inplace=True)
df.replace(1, 'B', inplace=True)
df

Unnamed: 0,Names,Groups
0,Ben Ayada Ghassen,B
1,Bolhassani Alireza,B
2,Ashley Nicole Honeycutt,B
3,Khezri Asa,A
4,Gokul Kiliyara Murikkinchery,A
5,Latonio Elaine Marie,B
6,Chiemeka Adeboye Madufor,A
7,Makvandi Davood,B
8,Schin Lotar Csaba,A
9,Seyed Iman Seyedi Tabari,A


## Course structure

* **Machine Learning:**
  * Introduction and overview, Supervised, unsupervised, reinforcement learning
  * Clustering, classification, regression, representation
  * Ensembles learning
  * Overfitting, validation


## What you will need to successfully complete the course (and subsequently use what you've learned)

* **attend the sessions**:
    * you will get to know and understand the theory and see how it is applied in practice;
    * you will get the chance to see how well you can practically implement the various solutions;
    * you will get the chance to interact with the tutors and the other students, thereby gaining a deeper understanding of the material.
        * You can and should take the opportunity to ask the tutors about _any_ points --- be they theoretical or practical --- about which you are unsure!

* **practice**:
    * The course is similar to a maths course inasmuch as you not only need to **understand the theory**, you must also be able to **apply it in practice**. It is one thing to **passively** understand the material, and a completely other thing to **actively** implement it!
    * Just as in the case of maths, the practice that can fit into the sessions is just the **bare minimum** for being able to solve the kind of tasks the course is about. You are encouraged to practise relevant tasks as much as you can!

* **develop the skill to independently discover solutions to your problems**
    * checking the docstrings and/or the documentation (or even the github issues pages) of the functions and packages you use
    * https://stackoverflow.com/ --- "Every data scientist has a tab open to Stack Overflow"
    * perhaps _the_ most important skill as a data scientist you can have!

## Some further notes

### Primary channel of communication

* By default, **communication is via the MS Teams team channel** --- you can post any questions you may have here, but of course you can direct message / email us tutors, too, if you feel more confident that way!
* All **vital** announcements are made via **Moodle News**, so that you automatically get an email about it in your IBS inbox and so that it is archived at IBS even after semester end.


### Breaks (or lack thereof) during classes

* According to the official schedule, after every 1 hour, there is a 20 minute break.
* We can be flexible in this depending on your preferences. Less breaks -> earlier end / more breaks -> later end.

### Bug hunting

* You can compete for the **"Bug Hunter of the Week"** title by finding any bugs, mistakes, typos in the notebooks!