# List of projects for day 1

## Introduction to python

#### Level: Beginner

Take the notebook used for the introduction top python and play with the examples.
    
## Project Euler

#### Level: Beginner - 

Go to https://projecteuler.net/. It's a website offering a collection of mathy programming challenges, starting from very simple ones and ending with very hard ones. The first 10-20 problems are a great playground for playing with a new programming language. The problems above 50-100 require advanced knowledge of math and algorithms.

Let's have a look at the [first problem](https://projecteuler.net/problem=1):

    Problem 1
    
    If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

    Find the sum of all the multiples of 3 or 5 below 1000.

How can you implement this on a computer? You have to loop over all the numbers and check if they are
divisible by 3 or 5. If yes, you add the number. 

If you would like to have some more hints, click  <a href="#hinteuler1" data-toggle="collapse">here</a>.

<div id="hinteuler1" class="collapse">
To loop over all the numbers below 1000, you can use a `for i in range(1000)` loop. To check if
a number is divisible by 3 or 5, use an `if` expression, and you can use `i % 3 == 0` to check if a number
is divisible by 3, for example. Finally, you should define a variable `total = 0` before the loop, and
add numbers to this variable inside the loop!
</div>

In similar ways, you can tackle the other Euler problems!



## Programming for biologists

#### Level: Beginner

Here are two basic problems from a python programming course expecially [aimed at biologists](http://www.programmingforbiologists.org)

### Body mass of dinosaurs

The length of an organism is typically strongly correlated with it's body mass. This is useful because it allows us to estimate the mass of an organism even if we only know its length. This relationship generally takes the form:

$$ \text{Mass} = a * \text{Length}^b$$

Where the parameters $a$ and $b$ vary among groups, mass is given in kg, and length in m. This allometric approach is regularly used to estimate the mass of dinosaurs since we cannot weigh something that is only preserved as bones.

Different values of $a$ and $b$ are for example ([Seebacher 2001](http://www.jstor.org/stable/4524171
))

[Therapoda](https://en.wikipedia.org/wiki/Theropoda) (e.g. T-rex): $a=0.73$, $b=3.63$
[Sauropoda](https://en.wikipedia.org/wiki/Sauropoda) (e.g. Brachiosaurus): $a = 214.44$, $b = 1.46$

get_mass_from_length() that estimates the mass of an organism in kg based on it's length in meters by taking length, a, and b as parameters. To be clear we want to pass the function all 3 values that it needs to estimate a mass as parameters. This makes it much easier to reuse for all species.

Use this function to compute the mass of [T-rex Trix](https://t-rex.naturalis.nl/Trix/default.aspx) on display in Naturalis, Leiden, which is 13 meters long. 
Compare to the [Camarasaurus](http://www.naturalis.nl/en/museum/top-pieces/camarasaurus-skeleton/) they have there, too.

### DNA vs RNA

Write a function, `dna_or_rna(sequence`), that determines if a sequence of base pairs is DNA, RNA, or if it is not possible to tell given the sequence provided. Since all the function will know about the material is the sequence the only way to tell the difference between DNA and RNA is that RNA has the base Uracil (u) instead of the base Thymine (t). Have the function return one of three outputs: 'DNA', 'RNA', or 'UNKNOWN'. Use the function and a for loop to print the type of the sequences in the following list.


In [7]:
sequences = ['ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcg',
             'gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuucaaaugcau',
             'gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc',
             'guuuccuacaguauuugaugagaaugagaguuuacuccuggaagauaauauuagaauguuuacaacugcaccugaucagguggauaaggaagaugaagacu',
             'gauaaggaagaugaagacuuucaggaaucuaauaaaaugcacuccaugaauggauucauguaugggaaucagccggguc']

Optional: For a little extra challenge make your function work with both upper and lower case letters, or even strings with mixed capitalization

## Cryptography 

### Caesar's cipher - trial and error

#### Level: Beginner

The following text is encrypted by shifting all letters in the alphabet by a fixed amount
(e.g. a shift by 2 would give A -> C, B -> D, ..., Z -> B). Decrypt it by trying out all possible shifts! (Just copy the assignment below to use the text in your python program)

In [5]:
encrypted_text1 = """
RW. XMJWQTHP MTQRJX, BMT BFX ZXZFQQD AJWD QFYJ NS YMJ RTWSNSLX, XFAJ ZUTS YMTXJ
STY NSKWJVZJSY THHFXNTSX BMJS MJ BFX ZU FQQ SNLMY, BFX XJFYJI FY YMJ GWJFPKFXY YFGQJ.
N XYTTI ZUTS YMJ MJFWYM-WZL FSI UNHPJI ZU YMJ XYNHP BMNHM TZW ANXNYTW MFI QJKY GJMNSI
MNR YMJ SNLMY GJKTWJ. NY BFX F KNSJ, YMNHP UNJHJ TK BTTI, GZQGTZX-MJFIJI, TK YMJ XTWY
BMNHM NX PSTBS FX F "UJSFSL QFBDJW." OZXY ZSIJW YMJ MJFI BFX F GWTFI XNQAJW GFSI SJFWQD
FS NSHM FHWTXX. "YT OFRJX RTWYNRJW, R.W.H.X., KWTR MNX KWNJSIX TK YMJ H.H.M.," BFX
JSLWFAJI ZUTS NY, BNYM YMJ IFYJ "1884." NY BFX OZXY XZHM F XYNHP FX YMJ TQI-KFXMNTSJI
KFRNQD UWFHYNYNTSJW ZXJI YT HFWWD—INLSNKNJI, XTQNI, FSI WJFXXZWNSL. 

"BJQQ, BFYXTS, BMFY IT DTZ RFPJ TK NY?" 
"""

To this end, write a *function* that takes an encrypted text as input as well as a shift, and that then
prints the decrypted string.

If you would like to have some hints, click  <a href="#hintcrypto1" data-toggle="collapse">here</a>.

<div id="hintcrypto1" class="collapse">
Remember, you can loop over all letters in a string using `for c in encrypted_text1:`. Also, you can
check if a letter is between "A" to "Z" by `"A" <= c and c <= "Z"`. One way of doing the subsitution is
to generate a dictionary with letters as keys and as values, and then use this to do the substitution in
the `for`-loop.
</div>

A typical mistake you can find <a href="#hintcrypto2" data-toggle="collapse">here</a>.

<div id="hintcrypto2" class="collapse">
There's a subtle logical mistake that is often done here: you might be compelled to first replace all `A`'s 
in the string with another letter. Say the shift is 1, and that letter is `B`. Then you will have in your string
two types of `B`'s: decrypted (from replacing `A`) and still encrypted (the `B` from the original encrypted string`. If you now replace both type of `B`'s in the next step, you run into problems ...

</div>

### Caesar's cipher - frequency analysis

#### Level: Intermediate

The Caesar's cypher can be broken by *frequency analysis*: Letters occur in a text with a certain probability. For example, a probability distribution for the English langiage can be found at [wikipedia](https://en.wikipedia.org/wiki/Frequency_analysis).

Make a probability analysis of the text above. Plot the histogram of the letters using `print` statements, that looks like 

    A ****
    B ********
    C **
    
etc. From that, determine the shift of the Caeser's cypher and decrypt it in one go.


### A step aside: Frequency analysis of English texts

#### Level: Intermediate

In the previous exercise, you were asked to break a cypher using the frequency of letters as given in [wikipedia](https://en.wikipedia.org/wiki/Frequency_analysis). 

You can create such a histogram of frequency of letters yourself. For this, download the text of "Romeo and Juliet" from the file `romeo_and_juliet.txt` (that was downloade from [project Gutenberg](https://www.gutenberg.org/ebooks/1112.txt.utf-8)), and count the occurences
of all letters A-Z (treating upper case A-Z and lower case a-z the same). Do you find the same result as in the wikipedia article?

## Substitution cipher

#### Level: Intermediate - Advanced

Now let's look at a more complicated cipher where we replace (uniquely) every letter by some other letter. This is not just described by a simple shift as before, but now we need to find a different mapping for every letter!

It is a bit difficult to do this by frequency analysis of single letters (although you might identify certain letters). It is easier to do this by doing a frequency analysis of [bigrams](https://en.wikipedia.org/wiki/Bigram#Bigram_frequency_in_the_English_language), i.e. two letter combinations.

Write a code to find out the probability of single letters and of the 10 most frequent bigrams. Use this as input to decypher the following text:

In [8]:
encrypted_text2 = """
  "IEGG ZE, DMISTW, DBMI FT RTN ZMOE TQ TNY LPSPITY'S SIPHO? SPWHE DE BMLE XEEW ST
NWQTYINWMIE MS IT ZPSS BPZ MWF BMLE WT WTIPTW TQ BPS EYYMWF, IBPS MHHPFEWIMG STNLEWPY
XEHTZES TQ PZJTYIMWHE. GEI ZE BEMY RTN YEHTWSIYNHI IBE ZMW XR MW EUMZPWMIPTW TQ PI."
  "P IBPWO," SMPF P, QTGGTDPWV MS QMY MS P HTNGF IBE ZEIBTFS TQ ZR HTZJMWPTW, "IBMI FY.
ZTYIPZEY PS M SNHHESSQNG, EGFEYGR ZEFPHMG ZMW, DEGG-ESIEEZEF SPWHE IBTSE DBT OWTD BPZ
VPLE BPZ IBPS ZMYO TQ IBEPY MJJYEHPMIPTW."
  "VTTF!" SMPF BTGZES. "EUHEGGEWI!"
  "P IBPWO MGST IBMI IBE JYTXMXPGPIR PS PW QMLTNY TQ BPS XEPWV M HTNWIYR JYMHIPIPTWEY
DBT FTES M VYEMI FEMG TQ BPS LPSPIPWV TW QTTI."
  "DBR ST?"
  "XEHMNSE IBPS SIPHO, IBTNVB TYPVPWMGGR M LEYR BMWFSTZE TWE BMS XEEW ST OWTHOEF MXTNI
IBMI P HMW BMYFGR PZMVPWE M ITDW JYMHIPIPTWEY HMYYRPWV PI. IBE IBPHO-PYTW QEYYNGE PS
DTYW FTDW, ST PI PS ELPFEWI IBMI BE BMS FTWE M VYEMI MZTNWI TQ DMGOPWV DPIB PI."
  "JEYQEHIGR STNWF!" SMPF BTGZES.
  "MWF IBEW MVMPW, IBEYE PS IBE 'QYPEWFS TQ IBE H.H.B.' P SBTNGF VNESS IBMI IT XE IBE
STZEIBPWV BNWI, IBE GTHMG BNWI IT DBTSE ZEZXEYS BE BMS JTSSPXGR VPLEW STZE SNYVPHMG
MSSPSIMWHE, MWF DBPHB BMS ZMFE BPZ M SZMGG JYESEWIMIPTW PW YEINYW."
  "YEMGGR, DMISTW, RTN EUHEG RTNYSEGQ," SMPF BTGZES, JNSBPWV XMHO BPS HBMPY MWF GPVBIPWV
M HPVMYEIIE. "PWIEYESIPWV, IBTNVB EGEZEWIMYR," SMPF BE MS BE YEINYWEF IT BPS QMLTNYPIE
HTYWEY TQ IBE SEIIEE. "IBEYE MYE HEYIMPWGR TWE TY IDT PWFPHMIPTWS NJTW IBE SIPHO.
PI VPLES NS IBE XMSPS QTY SELEYMG FEFNHIPTWS."
  "BMS MWRIBPWV ESHMJEF ZE?" P MSOEF DPIB STZE SEGQ-PZJTYIMWHE. "P IYNSI IBMI IBEYE PS
WTIBPWV TQ HTWSEANEWHE DBPHB P BMLE TLEYGTTOEF?"
  "P MZ MQYMPF, ZR FEMY DMISTW, IBMI ZTSI TQ RTNY HTWHGNSPTWS DEYE EYYTWETNS."
"""

Hint: Don't try to do everything in one go. As you have identified certain letters, substitute those in the text, and then guess other letters. Decrypt the text thus step by step.

## Babynames

#### Level: Intermediate

The file `babynames.txt` contains data on all names given to male children in the Netherlands in 2015 (from [SVB](https://www.svb.nl/int/nl/kindernamen/artikelen/top20/jongens/index.jsp). I apologize for only giving boy's names - for some reason the SVB didn't give a file with all girl's names)

Read the file and do some analysis of the data:

- Find the most common name. How many percent of Dutch male children were named like this in all of 2015?
- Find the shortest/longest name
- Find the name with the most special characters ("special" = not A-Z)
- Find the top 20 names, and give the percentage how often children were named like this. What is the total percentage of the top 20 names, i.e. how often are top 20 names given?

## Find the zero of a function

#### Level: Intermediate to Advanced

Write a function to find the zero of a mathematical function $4f(x)$ within an interval $[a, b]$, when $f(a)$ and $f(b)$ have opposite sign (this makes sure there is at least one zero in this interval.

Find this zero by *bisecting* the interval: Take the middle point $c$, and find out which interval now must have the zero by comparing the signs of the $f(a)$, $f(b)$, and $f(c)$. Iterate this procedure to zoom in on
the zero.

Write a python function `find_root(f, a, b)` that implements this procedure. This python function should take another python function `f` as input, as well as two numbers `a` and `b`.

*Possible extensions, optional*: 
- Have another input parameter specifying the numerical accuracy with which you want to find the zero.
- Check the user input to make sure `find_root` can actually find a root.