# About this course

This course aims at equipping with a practical and intellectual toolbox for manipulating and analysing data. The first part consist in learning the python programming language, yet the most popular language in data science, and more specifically in astronomy. A second part of the lecture consists in using python for signal processing applications (Foruier analysis, sampling, analysis of time-series). In the third part, we will use python to acquire a *practical* understanding of important statistical concepts/methods existing to analyse data (how to properly estimate errors, derive trends in data points, ...). For this, we will not only review the theory of data analysis but also spend time in learning to use some existing tools implementing those concepts. We will mostly pick topics among the following:  statistical classical or frequentist inference (i.e. Maximum Likelihood Estimation,  confidence interval via Jackknife and bootstrapping, hypothesis test), Baysian statistical inference (MCMC for confidence interval estimation, model selection). 
 
We know that if you choose to learn physics and astronomy, this was probably not to learn coding. Writing lines of code to execute routine tasks, do complex calculations, dig into large databases or simply analyze data is however part of astro-/geo-physicist's life. This class will provide you with a *toolbox* into which you will pick tools to extract all the relevant information from the data you have in hands. It is not only sufficient to know where is a tool! We want to understand how to use them correctly and understand the concepts that make them such powerful ! 

The lecture is given using **[Jupyter notebooks](01-Intro_to_python/Jupyter.ipynb)**. We will spend some time explaining in more details how jupyter notebooks are working, but in short, what you have to know is that the notebook enable us to by-pass the classical three steps procedure one uses in general with compiled programming languages, namely (1) you write code; (2) you compule the code; (3) You execute the code. The notebook environment enables you to run any python code within a user-firendly environment that enables you to directly see the output of your code, add text in a visually appealing waym with enriched fonts, and display images (and even videos). You can read the notebooks as you read a web page, but your can modify any part of it very easily by clicking on a cell, and run it as easily. If you wish you can also export the notebook as a pdf file (and other options exist). 

In those notebooks, you will often see a few lines starting with "**Notes**". Those notes can generally be skipped at first lecture, but may become more insightful when the reader (you !) is already familiar with the topic covered by the main text. 

These notebooks may evolve with time. Any comment, suggestions are welcome, in order to improve your learning experience and the one of the others.  


## This course in practice

The lectures generally combine the use of slides, where the topics are introduced, and [Jupyter notebooks](http://jupyter.org/install) that will enable interactive experimentations and practical understanding. For this to work, we will use the computers of the classroom, but we strongly  encourage you to bring a laptop with your own installation of (a) a python coding environment; (b) an installation of the git "versioning management" tool. 

## Python installation (on you personal laptop/computer)

The easiest way to proceed is to install first the CONDA management system: https://conda.io/docs/ (Read that doc to understand how this works). 

In summary, you have two choices of python distributions: 
- 1) install `anaconda`, that comes with many pre-installed python packages and easy package and environment management: https://www.anaconda.com/download/
- 2) install `miniconda`: this is a light-weight version of `anaconda` and you only install what you want / really need. https://conda.io/miniconda.html
 
If disc-space is not a problem, go for (1), i.e. `anaconda` (if you define multiple environments, you might easily need up to 20Gb). Option (2) is also a viable alternative. 
 
During the lecture, we use version of Python > 3.5. If you make a fresh installation, you can take e.g. version 3.11.  

Once conda is installed, you will have to install some "libraries" (called modules in python) as well as jupyter for reading and run code interactively. 

### Installing libraries

You can install libraries via command line or through `anaconda-navigator` (if you have installed the anaconda distribution). While the `anaconda-navigator` is more user friendly, it is sometimes a bit more clunky. For that reason (but not only) we encourage you to do command line installation. Even if you use that option, I encourage you to create a virtual environment hosting libraries associated to this lecture (see below). 

A common problem with programming and use of external libraries is dependancies and conflicts in versions of librarires. To manage this problem, conda offers the possibility to create "virtual environments". You can see those environment as different sets of tools that you will use to manage different tasks. That's what we will do for this lecture. 

#### Step by step command-line installation

- Open a terminal for command lines: On Windows, run the program `anaconda prompt` (via the programme menu). On Linux and and MacOs, run `Terminal`
- Create a [conda environment](https://www.freecodecamp.org/news/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c) with the main libraries needed for this lecture, by typing the followin command line:
    ```bash
    conda create -c conda-forge --name py_SPAT0002 python=3.11 numpy scipy matplotlib astropy emcee ipykernel nb_conda_kernels notebook scikit-learn 
    ```
- Activate your environment using the following command-line:
  ```bash 
  conda activate py_SPAT0002
  ```
- Launch you jupyter-notebook by typing the following command line:
  ```
  jupyter-notebook
  ```
- Test your installation by running the notebook [Test_config.ipynb](Test_config.ipynb)

## Managing the notebooks

The Notebooks for this lecture are saved on a [github repository](https://github.com/SPAT0002-1/) (`repos`).

The best way to use these notebooks consists in using `git` (see below) to download the repository and then update your local copy. This is not difficult. A very good introduction to git can be found here:  https://swcarpentry.github.io/git-novice/ or https://github.com/drphilmarshall/GettingStarted#top (the latter link even includes a link to a youtube video). Since git does not require an external repository, you can use this "system versioning" tool to track any of your project on your computer. This works for a code, but also for e.g. a paper or a thesis you are writing. Instruction for onstalling git can be found at https://git-scm.com/book/en/v2/Getting-Started-Installing-Git. 

## Git installation

If git is not already installed on your machine (you can generally check this within a Terminal window easily), then you may follow instruction on [thi page](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git). In short, the stand-alone installation program are available along the following links depending of your OS:
- Windows:  https://git-scm.com/download/win 
- Mac OSx: If it is not installed, there is binary installer available [here](https://sourceforge.net/projects/git-osx-installer/) but it is not the latest option. Alternatively a slightly more demanding option requires installation of Xcode/Homebrew/macports (see https://git-scm.com/download/mac). 
- Linux-Ubuntu: `sudo apt-get install git`
- Linux Redhead/Fedora: `sudo yum install git` or `sudo dnf install git`

## Git in a nutshell

If you need further clarifications about what is git, here are a few useful references: 

- Watch YouTube Video: [A brief introduction to Git for Beginners](https://www.youtube.com/watch?v=r8jQ9hVA2qs&list=PL0lo9MOBetEFcp4SCWinBdpml9B2U25-f&index=1)
- Read the corresponding article (including some installation tips): [What is Git? Our beginner’s guide to version control](https://github.blog/developer-skills/programming-languages-and-frameworks/what-is-git-our-beginners-guide-to-version-control/)

## First Lecture 

In practice, to be up-to date with the lectures, you will end up typing the following commands in a terminal:

The First time: (Terminal commands for Linux or Mac OS)
```shell
mkdir SPAT0002   # create a directory to save a local copy of the repository
cd SPAT0002      # move into that directory
git clone https://github.com/SPAT0002-1/Ongoing.git   
cd Ongoing
```
### At the start of a lecture 

For simplicity, you should take the habit the *COPY* and *RENAME* the notebook with which we will be working, so you clearly identify your **own** version of the notebook into which you can add notes of any kind. For instance, if the notebook on which we work on is names `notebook.ipynb`, you may copy it into `notebook_MYNAME.ipynb`. 

### Before another lecture

Another day, when you know that the course repository has been updated, you *go into the directory containing your repository*. (But see [Notes](#Notes:) about possible error message you may encounter during subsequent `pull`). 

``` shell
git pull
```

**If you have problem with this, have error messages, or anything odd (e.g. we told you that there is new material but git pull does nothing): LET US KNOW !**  

Then, you may launch a Jupyter notebook using the following command into a terminal and then navigate to the notebook of interest: 

``` bash
jupyter-notebook 
```

You access the notebook of the lecture through the index or directly navigating through the directories. If you want to keep your own version, do not forget to make a copy of the notebook as explained [here](#At-the-start-of-a-lecture)

To go beyond this, you may need to learn some more command-lines or use a `git-client` with a nice GUI (`Graphical User Interface)`. Even if more user friendly, this does not provide you to understand how git is working behind the scene ....    
Various clients (free or not) exist. You might be interested in gitkraken, GithubDesktop, SourceTree, ... There are a lot of webpages describing pro and cons of alternative clients. 

### Notes: 
1. If there is an update to a file that you already modified, you'll encounter an error message because the state of the file is different on the remote repository and on your local directory. The best thing to do is to rename the conflicting files if you want to keep them in their previous version (e.g. you are commenting the notebook on which we are working on). An alternative is to "stash" your local repos using `git stash`  (https://git-scm.com/book/en/v1/Git-Tools-Stashing) **BUT beware**, `git stash` will stash any modified version of a file that is also present on the remote repository (no only the files that are yielding an error message after your `git pull`). There is a possibility to restore stashed files (using `git stash apply`) but things can sometimes get tricky...  

2. If you work on a specific project, you can use git locally without any remote set up.

A typical set of command would then be: 
``` shell 
mkdir myworking_directory
git init # the first time you run git for that directory 
echo 'Hello' > file.txt # edit your own file
git commit 'Initial commit'
git push 
```

See https://swcarpentry.github.io/git-novice/04-changes/index.html  for a first try ! 
For the course, you may want to create a separate copy of each notebook. A more professional option is to create a working branch into which you'll keep your own modifications of the lecture. 

## References: 

* Two alternative introductions to git:
  - [https://swcarpentry.github.io/git-novice/](https://swcarpentry.github.io/git-novice/)
  - [https://github.com/drphilmarshall/GettingStarted#top](https://github.com/drphilmarshall/GettingStarted#top) 
* Git/Github Cheat Sheet:  [https://education.github.com/git-cheat-sheet-education.pdf](https://education.github.com/git-cheat-sheet-education.pdf)