Skip to content
Andrea Vandin edited this page Apr 14, 2022 · 9 revisions

This website collects material for the PhD courses IProML: Introduction to Programming and Machine Learning in Python

  • Module 1 and Module 2

The courses are held

The right-sidebar can be used to navigate pages related to the course, e.g., to retrieve slides, code and assignments for our Lectures.


SYLLABUS INFORMATION

Course responsibles:

Language: English

Duration:

  • Module 1: 16h, From 18/06/2021 to 02/07/2021
  • Module 2: 14h, From 05/07/2021 to 19/07/2021

Description

The course introduces students to programming and data analysis, using python as a reference language.

  • Module 1 introduces students to the fundamental principles of structured programming with basic applications to data processing. The module starts from basic notions of programming (variables, data types, collections, control & repetition structures, functions & modules), up to basic data processing functionalities (loading, manipulation, and visualization of CSV data).

  • Module 2 introduces the students to the components of typical data analysis processes and machine learning pipelines. It first builds the necessary toolset by introducing popular Python libraries for data manipulation/visualization (NumPy, Pandas, Seaborn, scikit-learn), applied to simple applications. The toolset is then applied to a more complex case study on the classification of benign and malignant breast cancer, including aspects of data preprocessing, dimensionality reduction, clustering, and classification. The module concludes by presenting KNIME, a popular python-integrated workflow-based language for data analysis.

A student who has met the objectives of the course will acquire an understanding of the issues and tasks involved in structured computer programming and data analysis, to be able to make informed decisions. The student will be able to write python programs of various nature, with a focus on complex data analysis and predictive tasks.

Prerequisites: No prerequisites for Module 1, while Module 2 requires knowledge on computer programming obtained attending Module 1.

Materials

The course makes extensive use of online repositories and game-based e-learning platforms to

  • GitHub Wiki: collect and distribute slides, coding examples, assignments, and further course material
  • Colab: distribute and automatically provide feedback for coding assignments
  • Kahoot: perform online quizzes to monitor the learning process

Suggested books are

  • Learning Python, M. Lutz
  • Python for Data Analysis, W. McKinney
  • Statistics and Machine Learning in Python, E.Duchesnay, T.Löfstedt, F.Younes

We will use Pyhton as the programming language and statistical software of choice for the course.

  • Please visit the setup your machine entry on the right sidebar
  • Alternatively, you can run all code used in this course on the remote service Colab which does not require any installation

Evaluation

Students can attend single modules. These are ‘attività trasversali’, hence there will not be an exam, but an attendance certificate (attestazione di presenza) with mandatory attendance of at least 80%.

Regular coding assignments will be provided to increase the learning outcome and to self-evaluate students' learning process.

Attendance

Due to restriction imposed by the COVID-19 epidemics, the course will likely be conducted remotely.