![IE](img/ie.png)

# Python for Data Analysis I

## Master in Business Analytics & Data Science, October 2025

### Professor: Juan Luis Cano Rodríguez <jcano@faculty.ie.edu>

1. What's your name?
2. What's your dream job title after you finish this Master?
3. What's the last time you did something you were awful at?

# Outline

- Who am I?
- What is Python? Why Python?
- PyData Ecosystem
- About this course
    - Calendar, sessions
    - Material
    - Software
    - Learning objectives
    - Evaluation method

# Who am I?

![Me](img/juanlu-ubuntu-summit-2023.jpg)

- **Aerospace Engineer** from TU Madrid + 1 year at Politecnico di Milano
- **Developer Relations Engineer** at [**Canonical**](https://canonical.com/), the makers of Ubuntu Linux
- Working with **Python** for 10+ years in different roles, including as Product Manager for Kedro at McKinsey & Company
- **Contributor** to the PyData ecosystem: NumPy, SciPy, conda, Dask, ...
- **Instructor** of Python courses for Data Scientists at Airbus, Boeing R&T, Telefónica and others
- Honorific member of the **Python España** non-profit, former co-organizer of **PyCon Spain**, and current organizer of [**PyData Madrid**](https://guild.host/pydata-madrid/)

# What is Python?

```python
import pandas as pd

df = pd.read_csv("data.csv")
print(df.head())
```

* **Python is** a dynamic, interpreted programming language
* It is easy to learn, but powerful
* Features a huge ecosystem of contributed packages
* The backbone of the AI revolution

## Why is Python so popular?

>  Last month, Python reached the highest ranking a programming language ever had in the TIOBE index. We thought Python couldn't grow any further, but AI code assistants let Python take yet another step forward.

([source](https://web.archive.org/web/20250831052739/https://www.tiobe.com/tiobe-index/))

![TIOBE index, September 2025](img/tiobe-python-2025.png)

* Python was already popular for scripting, science and web development in the early '00s
* In the following years, **Data Science and Machine Learning, then Deep Learning, then AI** made it incredibly popular

More interesting surveys:

* Anaconda State of Data Science 2024: "Python continues to serve as the foundation of data science, with 93% of respondents using it in some capacity" https://www.anaconda.com/resources/report/state-of-data-science-report-2024
* Kaggle Survey 2022: "Python and SQL remain the two most common programming skills for data scientists" https://www.kaggle.com/kaggle-survey-2022

And lastly... because **programming feels like a superpower**!

![Superpower](img/superpower.jpg)

# PyData ecosystem

![PyData ecosystem](img/ecosystem/1.png)

![PyData ecosystem](img/ecosystem/2.png)

![PyData ecosystem](img/ecosystem/3.png)

![PyData ecosystem](img/ecosystem/4.png)

![PyData ecosystem](img/ecosystem/5.png)

## Disadvantages

* Python itself is _slow_ - but you won't notice if you use its specialized libraries (pandas, NumPy)
* Weaker time series and statistical analysis (you might want to use R for that)
* Lack of type enforcement can cause unexpected bugs in production

# About this course

## Learning objectives

1. Understand the basics of the Python programming language syntax
2. Learn how to perform tabular data manipulation with Pandas
3. Gain hands-on experience in how to use notebook interfaces (Jupyter, VS Code) to conduct exploratory analysis
4. Learn how to understand Python code written by another person or GenAI system
5. Learn how to solve algorithmic problems with Python

## Calendar

* **From Sep 30th to Dec 16th**
  - October: Python and Jupyter basics, mid-term exam
  - November: pandas (2 week break)
  - December: pandas, group presentations, final exam
* **All sessions will be hands on**, have your Colab notebooks ready!

## Evaluation method

| Criteria                   | Score % | Description |
|----------------------------|---------|-------------|
| Class Participation        |  10 %   | Attendance, attitude, questions posed during the class |
| Mid-term exam              |  25 %   | Proctored test covering all the material taught until the mid-term date |
| Group Project Deliverable  |  15 %   | Code deliverable |
| Group Project Presentation |  25 %   | Questions posed about the code deliverable to the group |
| Final Exam                 |  25 %   | Proctored test covering the whole syllabus |

Summary:
- 60 % individual exams + participation
- 40 % group project

## Individual exams

* Two exams: 1 mid-term, 1 final
* Proctored test
* Details coming soon

## Group project

* Analyze a small dataset, write a report, and present the results
* You will have to apply all the things we see in the subject
* Details coming soon

## Participation

* Everybody starts with maximum participation grade
* You may lose points for lack of engagement, arriving late, and other inappropriate attitude
* Check the University's [Code of Conduct](https://www.ie.edu/student-academic-standards/academic-integrity/#code-conduct), [Attendance Policy](https://www.ie.edu/student-academic-standards/academic-policies/), and [Ethics Code](https://docs.ie.edu/university/NEW-ethics-code.pdf) 

## AI policy

> **Generative artificial intelligence (GenAI) tools may be used in this course for coding, debugging, research, ideation, and other uses with proper acknowledgement.**

The evaluation system is designed so that the bulk of the grade is driven by what did you **understand**, and not your throughput.

AI assistants are like a friend spotting you on pull-ups. It's nice at first, but you should strive to do them on your own.

![Pull-ups](img/ai/assisted-pull-ups.jpg)

### What about _vibe coding_?

- Vibe coding, as [initially described by Andrej Karpathy](https://x.com/karpathy/status/1886192184808149383), is all about _not looking at the code_
- Therefore, it's essentially incompatible with learning how to _code_

![Vibe coding](img/ai/vibe-coding.png)

### Okay, but can I use it?

- Yes, you can use it
- For any take-home assignment, I will assume you used it (so I will grade you on your _understanding_)
- Exams will be non-digital (paper, or live interviews)

### Advice

- Ask the AI to **explain** the code until _you_ understand it

- **Always assume the AI output is wrong**, and as such:
  1. Always verify that the code _works_
  2. Always cross-check with the official documentation of the libraries you are using
  3. Always seek to understand the code that the AI produced
  4. Compare the output of different AI assistants

Remember: **you** are ultimately responsible from what you deliver.

![IBM computer accountability](img/ai/computer-accountability.jpg)

## Software

- Web-based: **Google Colab** (accessible with your IE account)
- Local: **Anaconda Distribution 2025.06** or later
  - Compatible with Windows, Mac and Linux

Other:
- Chat: **Mattermost** (link shared internally)

## Material

Exercises, notes, datasets: https://github.com/astrojuanlu/ie-mbd-python-data-analysis-i/

### Extra reading

- Main content: "A Whirlwind Tour of Python" by Jake Vanderplas https://github.com/jakevdp/WhirlwindTourOfPython
- pandas: "Python Data Science Handbook" by Jake Vanderplas https://github.com/jakevdp/PythonDataScienceHandbook

...yes, everything is freely available on the Internet :)

Have you ever wondered why, with so much free material and so many courses, there's such a shortage in programmers?

...probably because learning how to code is _damn hard_.

Embrace the challenge, and seize the opportunity to do it with others!

# How to succeed in this subject

- Develop a **bias for action**: instead of asking for _more_ reading material, **write as much code as possible**
  - In other words: **practice, practice, practice**
- Sometimes you will feel dumb and frustrated, it's normal, you will get over it
- Ask in class, ask in the chat, I am here to help you
- Assume I'm your **tech lead**, not your _client_

# Shall we begin?

![Talk is cheap](img/quote-talk-is-cheap-show-me-the-code-linus-torvalds-273528.jpg)