# Introductions

My name is Sergey Antopolskiy

I am a postdoc at Mathew Diamond's Tactile Perception and Learning Lab

# Organizational

### First

We have people with diverse backgrounds in programming, so it is inevitable that for some of you the pace of the course will be too fast and for others it will seem too slow. 

Here are my suggestions for these groups of people.

If you belong to the first group, you will need to invest more time between lectures in reviewing course materials and, more importantly, practicing. Set aside some time for that, especially try to free first week of the course from other major commitments. I am organizing *office hours*, time during which you can come and ask for my help (more on that below). Use it to catch up.

If you feel like I am going too slowly, look through the materials ahead and try to apply it to your data. Look at additional materials in the "where to go from here" section, I will put there links to some advanced topics as well. Also, look through the materials for the next lectures in advance and see what you might be interested in and what is very familiar to you. I will try to announce a day ahead what we will be covering on the next lesson.

### Second

I really want this course to be as interactive as possible. But it is always a challenge for the instructor. How I want to approach this:
- I will ask you questions during the lectures, like asking to raise your hands if you know something. This is not only to engage you, but also to understand how many of you know something, how much I need to go into details
- I might ask some of you to explain something, if you said that you know about it
- I might ask you to propose some explanation or ideas

The aim of this course is two-fold. On the one hand, I want you to learn certain concepts from computer science and programming, some of them you will be able to apply directly to your work. Some will help you to find information on that later when you need it. On the other hand, I want to teach you some practical skills, which you will be able to apply immediately to your work. In fact, I encourage you to do it as we go, and I will help you as much as I can. Then we can look together at what you did and how you did it, it will be helpful for everyone.

Course materials here: http://nbviewer.jupyter.org/github/antopolskiy/sciprog/tree/master/

Course Slack channel for discussions, question, announcements: https://sciprog.slack.com/ (you can sign up with @sissa.it email; if you don't have one, send an email to the course instructor).

# Office hours: come and talk to me!

I can answer your questions, help you with assignments or application of the concepts to your data.

- Tuesday from 16:00 to 18:00
- Thursday from 16:00 to 18:00

Find me in the office 324. You can come freely during these periods, but better write me on Slack or by email.

- Saturday from 10:00 to 12:00 (You will need to tell me in advance that you will come)

# Topics overview

- Python programming basics
- Numerical arrays
- Data organization principles (tidy data vs messy data)
- Data manipulation in Python
- Data collection: `psychopy` (Davide Crepaldi), May 2-9
- Visualization
- Data analysis in Python
- Bootstrap and statistical simulations
- Machine learning: core principles and practice
- Advance topics (if we have time)

# Let's get started

## Why learn programming? 

## Why not use Excel or other statistical tools with easy user interface?

### - scripting and modularity
### - speed and memory efficiency
### - freedom to try (almost) anything

Freedom is particularly important in research and development, because you want to push the boudaries, you don't want to only walk the "main road".

There are other practical reasons. Programming is *lingua franca* of industry and technology. If you ever want to leave academia, it is very likely that your best chances at interesting and fulfulling job are in the tech industry. In that case you absolutely need to be fluent in programming. Even if you want to stay in acamedia, you don't want to be *forced* to stay in academia. Doing anything, even something you like, while feeling like you don't have a choice is a sure way to be frustrated and stressed, and eventually start hating that thing you liked. Besides, in our days the gap between academia and industry is ever shrinking. So learning programming at least 1 language is one of the best investments in your future you can make.

# Ok, so we learn to program... but what?

### C++? Fortran? Pascal? Basic? Assembler?! No.

# Language we need is:

### Interpreted, not compiled (at least at first)

### High-level, not low-level

### Convenient for working with data

### (Relatively) easy to learn

What is the different between interpreted and compiled languages, at least in a nutshell? Why compiled languages are always so fast, which interpreted are only fast if you implement correctly? Answers: dynamic typing vs static typing, memory management.

# Our main choices are: MATLAB, R or Python

# Which language to choose?

### It doesn't matter

### Things to consider:

- You need to use big chunks of someone else's code 

- You want to use a particular library or package (Brainstorm, Psychopy)

However, remember that you don't need to use the same language at all stages of your process. Switching between several languages may prove tedious at first, but it usually simpler than it seems. You can easily run your experiment using one language, and analyse it in another.

- You want to work locally on any computer

Matlab is a paid software, and it requires you to have a licence. Even if your university has a licence, it is likely available only when you're connected to the network. Therefore, it can be difficult to work remotely.

# Pros and cons
(disclosure: somewhat subjective)

<img src="https://www.mathworks.com/content/mathworks/www/en/company/newsletters/articles/the-mathworks-logo-is-an-eigenfunction-of-the-wave-equation/_jcr_content/mainParsys/image_2.img.gif/1469941373397.gif" align="left" alt="Drawing" style="width: 80px;"/>

# Matlab 

Pros:
- out-of-the-box solution
- easy to start
- decent documentation: not great, but good enough
- really fast for linear algebra
- you write in (almost) mathematical notation
    
        X = [1 2 3, 4 5 6]
        X_transpose = X'
        
- "matlab apps"
- "parfor" allows easy multithreading
- Simulink

<img src="https://www.mathworks.com/content/mathworks/www/en/company/newsletters/articles/the-mathworks-logo-is-an-eigenfunction-of-the-wave-equation/_jcr_content/mainParsys/image_2.img.gif/1469941373397.gif" align="left" alt="Drawing" style="width: 80px;"/>

# Matlab 

Cons:
- expensive, need to pay for add-ons (toolboxes)
- developed by selected experts
- proprietary code ("closed source")
- slow development 
- plotting and exporting figures is not great
- data manipulation is clunky
- huge overhead costs (memory and CPU)
- not widely used outside academia
- small community

<img src="https://www.r-project.org/logo/Rlogo.svg" align="left" alt="Drawing" style="width: 80px;"/>

Pros:
- developed by statisticians
- great plotting capabilities
- great data manipulation capabilities
- vibrant community
- high demand on the data science market

<img src="https://www.r-project.org/logo/Rlogo.svg" align="left" alt="Drawing" style="width: 80px;"/>

Cons:
- developed by statisticians
- sometimes obscure syntax
- slow
- lot's of packages, sometimes without a "standart"
- relatively steep learning curve
- not general language

<img src="https://www.python.org/static/opengraph-icon-200x200.png" align="left" alt="Drawing" style="width: 80px;"/>
# Python

Pros:
- really well designed language
    - gets both of 2 worlds: speed for computations from MATLAB (`numpy` package) and data manipulation from R (`pandas` package) 
- (relatively) easy to learn
- a lot (!) of great resourses
- great string manipulation (*de-facto* standart for NLP; R and MATLAB are not even close)

<img src="https://www.python.org/static/opengraph-icon-200x200.png" align="left" alt="Drawing" style="width: 80px;"/>
# Python

Pros:

- object-oriented and introspective
- general purpose (learn for data analysis, but use for anything!)
- standardized "stack" of packages for data analysis
- Jupyter notebooks
- huge and welcoming community offline (PyData) and online (on Stackoverflow)

<img src="https://www.python.org/static/opengraph-icon-200x200.png" align="left" alt="Drawing" style="width: 80px;"/>
# Python

Cons:
- None!

Ok, I am joking: I am an enthusiast, not a fanatic :D

<img src="https://www.python.org/static/opengraph-icon-200x200.png" align="left" alt="Drawing" style="width: 80px;"/>
# Python

Cons:
- uneven documentation 
    - some things are amazingly well documented: scikit-learn, pandas, matplotlib
    - other - not so much: wavelets, psychopy, etc
- can be slow if you do things in a wrong way
- sometimes syntax may be too verbose
- some things are still missing

# Before we go into nitty-gritty details

- we will look (mostly) at Python, but if you cannot find how certain things are done in Matlab, ask on Slack or during office hours; for R ask Davide Crepaldi (dcrepaldi@sissa.it)

- in the classroom focus on the conceptual understanding of what you **can** do, as opposed to **how** you do it (you will have the course materials for that)

- ask questions at any point if something is unclear

- outside classroom focus on **how** you do things, try for yourself; bring questions on Slack, on the next lesson or during office hours

- the goal of the class is for you to be able to apply concepts to your own work

Who here has their own data? If someone doesn't have data, either ask in your lab, or find online, or ask me.