<p style='text-align: center'><a href=https://www.biozentrum.uni-wuerzburg.de/cctb/research/supramolecular-and-cellular-simulations/>Supramolecular and Cellular Simulations</a> (Prof. Fischer)<br>Center for Computational and Theoretical Biology - CCTB<br>Faculty of Biology, University of Würzburg</p>

<p style='text-align: center'><br><br>We are looking forward to your comments and suggestions. Please send them to <a href=sabine.fischer@uni.wuerzburg.de>sabine.fischer@uni.wuerzburg.de</a><br><br></p>

<h1><p style='text-align: center'> Introduction to Python </p></h1>

## Import and Export

Content:<br>
- data stream
- reading data from file
- writing data to file
- string formatting
- json
- pandas
- exporting plots to a file

### 1. Data stream
A continuous stream of data. Data is read from incoming data streams (downstreams) and output to outgoing data streams (upstreams). Inputs from the keyboard, prints to the screen as well as files are examples of data streams.  

Standard data streams are used through `print()` and `input()`.

In [None]:
print('text')

In [None]:
a=input()

In [None]:
a

### 2. Reading data from a file

In [None]:
fObj=open('dictionary.txt','r')

In [None]:
for line in fObj:
    print(line)

In [None]:
fObj.close()

In [None]:
fObj.closed

In [None]:
for line in fObj:
    print(line)

In [None]:
words={}
fObj=open('dictionary.txt','r')
for line in fObj:
    lineParts=line.split(" ")
    words[lineParts[0]] = lineParts[1]
fObj.close()    

In [None]:
words

Using `with` provides an alternative without explicitely closing the file object.

In [None]:
with open('dictionary.txt','r') as fObj:
    for line in fObj:
        print(line)

### 3. Writing data to a file

In [None]:
fObj=open('file.txt','w')

In [None]:
fObj.write("This is a text")

In [None]:
fObj.close()

In [None]:
nucleiData={'centroid': [110,112,240],'volume': 250,'cellType': 'stem_cell'}

In [None]:
with open ('results.txt','w') as fObj:
    for entry in nucleiData:
        fObj.write("{} {}\n".format(entry,nucleiData[entry]))

### 4. String formatting

In [None]:
str(4)

In [None]:
smallerNumber= 4
largerNumber=7

In [None]:
'The number '+str(smallerNumber)+ ' is smaller than ' + str(largerNumber) 

In [None]:
f'The number {smallerNumber} is smaller than {largerNumber}'

The method `str.format()`

In [None]:
print('We are the {} who say "{}!"'.format('knights', 'Hi'))

In a format string you can refer to arguments by their index, starting with {0}, or by name:

In [None]:
"The number {0} is greater than {1}, but {0} is less than {largest_num}".format(6,5,largest_num=7)

`str` provides readable output, while `repr` provides 'official' representation. The latter is useful for debugging.<br>
Rue of thumb: `str` is for customers, `repr` for developers.

In [None]:
import datetime

In [None]:
print(str(datetime.datetime.now() ))

In [None]:
print(repr(datetime.datetime.now() ))

### 5. Listing directory content

In [None]:
import os

Get the current working directory.

In [None]:
os.getcwd()

In [None]:
currentPath=os.getcwd()

In [None]:
os.listdir(path=currentPath)

https://docs.python.org/3/library/os.html#os-file-dir

### 6. json (JavaScript Object Notation)

The JSON format provides an easy way to save more complex data types like nested lists and dictionaries. The data hierarchies are converted to string representations for export (serializing) and reconstructed from the string representation for input (deserializing).  <br>
It is commonly used in modern applications and therefore a good choice for interoperability.

In [None]:
import json

**Serializing**

In [None]:
json.dumps(words)

In [None]:
file=open('dictionary_json.txt','w')

In [None]:
json.dump(words,file)

In [None]:
file.close()

In [None]:
with open('dictionary_json_2.txt','w') as fileObj:
    json.dump(words,fileObj)

**Deserialiszing**

In [None]:
with open('dictionary_json_2.txt','r') as fileObj:
    x=json.load(fileObj)

In [None]:
x

### 7. Pandas

In [None]:
import pandas as pd

Pandas is particularly useful for data analysis. It provides the two data types: `Series` and `DataFrame`.

In [None]:
s = pd.Series([1, 3, 5, 6, 8])

In [None]:
s

In [None]:
 df = pd.DataFrame({'DataSet': 1.,
                    'Date': pd.Timestamp('20130102'),
                    'Area': [15,12,13,27],
                    'Category': ["test", "train", "test", "train"],
                    'Validity': 'valid'})

In [None]:
df

Writing to a .csv file

In [None]:
df.to_csv('dataFrame.csv')

Reading from a .csv

In [None]:
fromCSV=pd.read_csv('dataFrame.csv')

In [None]:
fromCSV

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

### 8. Exporting plots to a file

In [None]:
import matplotlib.pyplot as plt

In [None]:
category=['A', 'B', 'C', 'D']
values=[44,55,32,41]
error=[5,8,7,9]
plt.bar(category, values, color=['red','blue','green','orange'], width=0.8, yerr=error, capsize=3, edgecolor='black', lw=2)
plt.title('first barplot')
plt.xlabel('category')
plt.ylabel('value')
plt.savefig('barplot.png')

In [None]:
fig1 = plt.figure()
ax = fig1.add_axes([0.1, 0.1, 0.8, 0.8])

# draw lines
l1, = ax.plot([0.1, 0.5, 0.9], [0.1, 0.9, 0.5], "bo-",
              mec="b", lw=5, ms=10, label="Line 1")
l2, = ax.plot([0.1, 0.5, 0.9], [0.5, 0.2, 0.7], "rs-",
              mec="r", lw=5, ms=10, color="r", label="Line 2")
ax.set_xlabel('x-axis')
ax.set_ylabel('y-axis');

In [None]:
fig1.savefig('lines.png')

In [None]:
fig1.savefig('lines.pdf')

### 9. Summary

- `fileObject.read()` and `fileObject.write()` provides a means to read/write data as strings from/to a file
- json provides an easy way for storing and reading lists, tuples and dictionaries
- pandas has useful functionality for reading and storing data in table form

## Further reading

Input, output and string formating in Python: https://docs.python.org/3/tutorial/inputoutput.html

More information on `open`: https://docs.python.org/3/library/functions.html#open 

More information on json: https://docs.python.org/3/library/json.html#module-json

More information on `pandas.read_csv()` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html,<br>
`pandas.DataFrame.to_csv()` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html, <br>
`pandas.read_excel()` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html,<br>
and `pandas.DataFrame._to_excel()` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html

## Exercises

<p style='color: green'>easy</p>
1. Generate a string and write it to a .txt file

<p style='color: green'>easy</p>
2. Read the string from the .txt file that you have generated in the previous exercise.

<p style='color: green'>easy</p>
3. Prompt a user to input two numbers using the function input() and ouput the sum of the two numbers using print() and .format().

<p style='color: green'>easy</p>
4. For the function `pd.read_csv()`, find the right value for the parameter `index_col`, such that the variable `fromCSV` does not contain the column *Unnamed:0*. 

<p style='color: green'>easy</p>
5. Use the values for x and y as given below- Generate a line plot and export the figure as a .png.

In [None]:
import math
xValues=[2,4,7,9,13,15]
yValues=[math.exp(x) for x in xValues]

<p style='color: green'>easy</p>
6. Import the dataset "DrugScreen1" (DrugScreen1.csv). Before importing, check which row of the .csv file contains the header.

<p style='color: green'>easy</p>
7. Load the text from the file AliceInWonderland.txt and count the number of words.

<p style='color: green'>easy</p>
8. Get the number of files in the current working directory.

<p style='color: orange'>medium</p>
9. Count the number of .txt files in the current working directory.

<p style='color: orange'>medium</p>
10. Import the dataset drugScreen2 from the file DrugScreen2.csv. Check that the import worked correctly. If not, adjust the import parameters.

<p style='color: orange'>medium</p>
11. Test which one is faster, list comprehension or for-loop.

<p style='color: orange'>medium</p>
12. Write code for looking up translations in the dictionary `word`. A user should be prompted to input an English word. If it is one of the words in the dictionary, the German translation should be displayed on the screen. If the word is not contained in the dictionary a respective notice should be displayed on the screen.

<p style='color: red'>hard</p>
13. Import the dataset "Weights" (Weights.csv). Plot a histogram of the weights and export it as a PDF.