Home

This website collects material for the PhD courses IProML: Introduction to Programming and Machine Learning in Python

Module 1 and Module 2

The courses are held

in June-July 2021 at Scuola Normale Superiore, Pisa
by Andrea Vandin and Daniele Licari

The right-sidebar can be used to navigate pages related to the course, e.g., to retrieve slides, code and assignments for our Lectures.

SYLLABUS INFORMATION

Course responsibles:

Language: English

Duration:

Module 1: 16h, From 18/06/2021 to 02/07/2021
Module 2: 14h, From 05/07/2021 to 19/07/2021

Description

The course introduces students to programming and data analysis, using python as a reference language.

Module 1 introduces students to the fundamental principles of structured programming with basic applications to data processing. The module starts from basic notions of programming (variables, data types, collections, control & repetition structures, functions & modules), up to basic data processing functionalities (loading, manipulation, and visualization of CSV data).
Module 2 introduces the students to the components of typical data analysis processes and machine learning pipelines. It first builds the necessary toolset by introducing popular Python libraries for data manipulation/visualization (NumPy, Pandas, Seaborn, scikit-learn), applied to simple applications. The toolset is then applied to a more complex case study on the classification of benign and malignant breast cancer, including aspects of data preprocessing, dimensionality reduction, clustering, and classification. The module concludes by presenting KNIME, a popular python-integrated workflow-based language for data analysis.

A student who has met the objectives of the course will acquire an understanding of the issues and tasks involved in structured computer programming and data analysis, to be able to make informed decisions. The student will be able to write python programs of various nature, with a focus on complex data analysis and predictive tasks.

Prerequisites: No prerequisites for Module 1, while Module 2 requires knowledge on computer programming obtained attending Module 1.

Materials

The course makes extensive use of online repositories and game-based e-learning platforms to

GitHub Wiki: collect and distribute slides, coding examples, assignments, and further course material
Colab: distribute and automatically provide feedback for coding assignments
Kahoot: perform online quizzes to monitor the learning process

Suggested books are

Learning Python, M. Lutz
Python for Data Analysis, W. McKinney
Statistics and Machine Learning in Python, E.Duchesnay, T.Löfstedt, F.Younes

We will use Pyhton as the programming language and statistical software of choice for the course.

Please visit the setup your machine entry on the right sidebar
Alternatively, you can run all code used in this course on the remote service Colab which does not require any installation

Evaluation

Students can attend single modules. These are ‘attività trasversali’, hence there will not be an exam, but an attendance certificate (attestazione di presenza) with mandatory attendance of at least 80%.

Regular coding assignments will be provided to increase the learning outcome and to self-evaluate students' learning process.

Attendance

Due to restriction imposed by the COVID-19 epidemics, the course will likely be conducted remotely.

IProML: Introduction to Programming and Machine Learning in Python
The right-sidebar can be used to navigate pages related to the course, e.g., to retrieve slides, code and assignments for our Lectures.

Provide feedback

Saved searches