# User input and textfiles



## Contents
  
* User input (sys)
* User input (input)
* Useful string operations
* Reading data from files
* Writing data to file
* Errors/exceptions

## Command line arguments:

Command line arguments are "words" written after the program name when you run it, e.g. 
```bash
python3 hello_world.py 10
```
Here the command line argument is 10. If you are running from spyder on anaconda (or any iPyhon environment) command line arguments are provided by

```python
!python3 hello_world.py 10
```

## Recap of the sys.argv list:

sys is a python package and may be imported by
```python
import sys
```



### The first example (test_sys.py):


```python

import sys
print(f"In this program, {sys.argv[1]} is the command line argument")

```

In [None]:
!python3 test_sys.py 10

### Several command line arguments (test_sys2.py)

the "magic" sys.argv[1] line refers to the list sys.argv which consists of all the words (separated by spaces) that are listed on the command line (including the program name)

```python
import sys

sys_argv_list = sys.argv
cmd_args = sys.argv[1:]

print("The sys.argv list looks like this: ", sys_argv_list)
print("But we are only interested in these arguments: ", cmd_args)
```

In [None]:
!python3 test_sys2.py 10 20 30 40

# The input function
Another way of getting user information is the input function. The user does not need to provide command line arguments, but can reply to questions from the program: 

In [None]:
number = input('Write a number:')
print(f'Your number is {number}')

In [None]:
numbers = input("Write many numbers separated by spaces")
print(f"Your numbers are", numbers)

In [None]:
numbers = input("Write many numbers separated by spaces")
print(f"Your numbers are", numbers.split())

The final exapmle is quite similar to the result from sys.argv[1:]. I prefer sys, it is faster. 

## Useful string operations 

In [None]:
info = input("Write the current day, date and time separated by spaces")
infolist = info.split()
day = infolist[0]
date = infolist[1:-2]
time = infolist[-1]
print(f"Today is {day}. The date is {date}. The time is {time}")

In [None]:
info = input("Write the current day, date and time separated by commas")
infolist = info.split(',')
day = infolist[0]
date = infolist[1]
time = infolist[2]
print(f"Today is {day}. The date is {date}. The time is {time}")

## The join method
define the list
```python
date = ['09', 'September']
```
Can you extract the info from the list into one string saying only "09 September"

In [None]:
' '.join(date)

## Adding strings

In [None]:
date[0]+' '+date[1]

# Reading from textfiles (.txt, .dat etc.)

To open a datafile in the same location as the current .py-file use the syntax open(filename) where filename is the name of the datafile as a string:
```python
infile = open('example_data.txt')
```
Here example_data may look like this: 
```bash
This is the first line of the file
This is the second line of the file
Below comes the interesting part of the file: 
10 20 30 
20 30 1
2.2 125 6.45
0.1 20 3.14
```

In [None]:
infile = open('example_data.txt')
infile

We can read the file line by line by using the method readline:

In [None]:
line1 = infile.readline()
line2 = infile.readline()
print(line1)
print(line2)

If we are not interested in the first (few) lines we can call infile.readline() a few times to skip those lines. 

In [None]:
infile = open('example_data.txt')
infile.readline()
infile.readline()
line3 = infile.readline()
print(line3)

The TextIOWrapper can be iterated over and starts at the current line in the datafile. We have already called infile.readline() three times since opening the last time, thus the first three lines are omitted in the for loop below:

In [None]:
for line in infile:
    print(line)

In [None]:
infile = open('example_data.txt')
lines = infile.readlines()
print(lines)

Now assume that we wanted to store the numbers from the file in three lists/columns: c1, c2 and c3. In the end we should end up with: 
```python
c1 = [10, 20, 2.2, 0.1]
c2 = [20, 30, 125, 20]
c3 = [30, 1, 6.45, 3.14]
```

## Exercise:
1) Read the file "GRA4157/lectures/02-python-summary2/example_data.txt" in python

2) Create empty lists c1, c2, and c3. Then iterate over the infile and add the first number in each line to c1, the second number in each line to c2 and the third number to c3. The type of objects in the lists should be float. 

## Useful string operations 2:

The methods startswith, in and endswith are useful string operations that may be used when reading files. 

example_data2.py
```bash
This is a header
This is a header
Numbers: 1 2 3
Numbers: 2 3 4
5 6 7
```
1) We are only interested in the lines that starts with "Numbers":

In [None]:
infile = open('example_data2.txt')
for line in infile:
    if line.startswith('Number'):
        print(line)

2) We are only interested in the lines that does not end with "header"

In [None]:
infile = open('example_data2.txt')
for line in infile:
    if not line.endswith('header\n'):
        print(line)

In [None]:
infile = open('example_data2.txt')
for line in infile:
    if not line.strip().endswith('header'):
        print(line)

3) We are only interested in lines that has the number 2 in them

In [None]:
infile = open('example_data2.txt')
for line in infile:
    if '2' in line:
        print(line)
infile.close()

# Writing to file:

To write to file, we still use the open() function, but we have to specify that we want to write to file. 
```python
outfile = open('outfile.txt','w')
```
The mode, here "w", indicates that we want to write to file. The default value (when nothing is provided as in the previous examples) indicates that we want to read from file. Warning: If the file outfile.txt exists, everything will be overwritten by what we decide to write to the file. 

In [None]:
outfile = open('outfile.txt','w')
outfile.write('This is the first line of the file')
outfile.close()

In [None]:
outfile = open('outfile.txt','w')
outfile.write('The previous line is deleted and this is the new line')
outfile.close()

We can append to existing files using the 'a' mode when opening:

In [None]:
outfile = open('outfile.txt','a')
outfile.write('The previous text is still there, and this line was just appended')
outfile.close()

Use \n for newline

In [None]:
outfile = open('outfile.txt','a')
outfile.write("Now let's add a new line:\n")
outfile.write("This is the new line")
outfile.close()

We perform all the operations we want on a file before closing it. In the previous example we closed the file after each operation to inspect changes while we wrote. 

## Exercise: write a table to file
Assume that we have 10 numbers in a python list: [1,2,3,4,5,6,7,8,9,10]. 
Write a file that contains the numbers as a column and another column with the square root of the given number. The file should look like this: 
```bash
x sqrt(x)
1 1
2 1.41
...
```
You can decide on how many decimals and on which format the sqrt(x) should contain in the file. 

## Starting to work with bigger data sets
Example: How to people spend their time? 
<img src="img/Timeuse.png" style="width: 90%; margin: auto;">



I have exported the data to a .txt file: 
```text
,Country,Category,Time (minutes)
0,Australia,Paid work,211.146629603892
1,Austria,Paid work,279.53226810278
2,Belgium,Paid work,194.476452188763
3,Canada,Paid work,268.660609647898
4,Denmark,Paid work,199.771595915566
...
```


Let us now assume that we are only interested in how much people in a given country work (Paid work). How would you extract this information? The file has 462 lines, so manual reading is not effective.






















In [1]:
infile = open('data/Time-use.txt')
for line in infile:
    if "Paid work" in line:
        print(line)

0,Australia,Paid work,211.146629603892

1,Austria,Paid work,279.53226810278

2,Belgium,Paid work,194.476452188763

3,Canada,Paid work,268.660609647898

4,Denmark,Paid work,199.771595915566

5,Estonia,Paid work,230.788021522939

6,Finland,Paid work,200.047879

7,France,Paid work,170.060642467443

8,Germany,Paid work,223.839012333333

9,Greece,Paid work,187.368060303144

10,Hungary,Paid work,199.446772180411

11,Ireland,Paid work,231.22452

12,Italy ,Paid work,148.894551667369

13,Japan,Paid work,325.711372400686

14,Korea,Paid work,287.596845923687

15,Latvia,Paid work,316.55785239539

16,Lithuania,Paid work,303.647286983591

17,Luxembourg,Paid work,247.225461265374

18,Mexico,Paid work,302.333564

19,Netherlands,Paid work,217.630356318106

20,New Zealand,Paid work,241.0

21,Norway ,Paid work,200.78622

22,Poland,Paid work,229.279299471607

23,Portugal,Paid work,258.797202797203

24,Slovenia,Paid work,226.685159500693

25,Spain,Paid work,175.7783

26,Sweden,Paid work,262.187136990862

2

In [5]:
infile = open('data/Time-use.txt')
work = {}
for line in infile:
    if "Paid work" in line:
        info = line.split(',')
        country = info[1]
        hours = info[-1]
        work[country] = float(hours.strip())
print(work)

{'Australia': 211.146629603892, 'Austria': 279.53226810278, 'Belgium': 194.476452188763, 'Canada': 268.660609647898, 'Denmark': 199.771595915566, 'Estonia': 230.788021522939, 'Finland': 200.047879, 'France': 170.060642467443, 'Germany': 223.839012333333, 'Greece': 187.368060303144, 'Hungary': 199.446772180411, 'Ireland': 231.22452, 'Italy ': 148.894551667369, 'Japan': 325.711372400686, 'Korea': 287.596845923687, 'Latvia': 316.55785239539, 'Lithuania': 303.647286983591, 'Luxembourg': 247.225461265374, 'Mexico': 302.333564, 'Netherlands': 217.630356318106, 'New Zealand': 241.0, 'Norway ': 200.78622, 'Poland': 229.279299471607, 'Portugal': 258.797202797203, 'Slovenia': 226.685159500693, 'Spain': 175.7783, 'Sweden': 262.187136990862, 'Turkey': 217.02784, 'UK': 235.493233853676, 'USA': 251.137774, 'China': 314.781401804303, 'India': 271.540982812833, 'South Africa': 188.80248}


In [25]:
number = min(work.values())
print(number)

148.894551667369


In [26]:
idx = list(work.values()).index(number)
print(idx)

12


In [27]:
list(work.keys())[idx]

'Italy '

# Errors and exceptions
We often want to convert data from files (strings) to floating point numbers:

In [29]:
infile = open('example_data.txt')
numbers = []
for line in infile:
    info = line.split()
    try:
        number = float(info[0])
    except:
        print('Skipping line: ', line)
    numbers.append(number)
    

Skipping line:  This is the first line of the file

Skipping line:  This is the second line of the file

Skipping line:  Below comes the interesting part of the file: 



## Raise (throw) an exception
When working with input data, we often want the program to fail when wrong input is provided:

In [32]:
message = input('Write hello')
if message != 'hello':
    raise Exception('The input should be hello')

Write hellohallo


Exception: The input should be hello

There are numerous exceptions in python:

In [2]:
number = float(input('Write a number between 0 and 10'))
if number <= 0 or number >= 10:
    raise ValueError('The number must be between 0 and 10')

Write a number between 0 and 1099


ValueError: The number must be between 0 and 10

### Exercise

Locate the file GRA4157/lectures/02-python-summary2/data/Time-use.txt.

1) Write a program that reads the file, and prints out all information about Norway. 

2) Write a program that reads the file, and prints out the information about Leisure time for all countries. 

3) Write a program that reads the file, and writes a new file sleep.txt, only consisting of the hours of sleep per country. sleep.txt thus contains two columns, one column of countries and a corresponding column with hours of sleep. The header should be "Country Sleep-hours".

3) Write a program that computes a "happiness score" per country. The happiness score is computed via: hours_of_sleep + seeing_friends + other_leisure - abs(tv_and_radio - 100) - 0.2*paid_work

# More on reading files

In [4]:
with open('example_data.txt') as infile:
    for line in infile:
        print(line)

This is the first line of the file

This is the second line of the file

Below comes the interesting part of the file: 

10 20 30 

20 30 1

2.2 125 6.45

0.1 20 3.14



In [13]:
all_data = {}
with open('data/Time-use.txt') as infile:
    headers = infile.readline().split()
    for line in infile:
        info = line.split(',')
        country = info[1].strip()
        if not country in all_data:
            all_data[country] = {}
        else:
            all_data[country][info[2]] = float(info[3])

            
all_data['Norway']

{'Education': 40.478844,
 'Care for household members ': 23.0,
 'Housework': 82.88683,
 'Shopping': 20.44092,
 'Other unpaid work & volunteering': 76.614153,
 'Sleep': 492.1239,
 'Eating and drinking': 79.37977,
 'Personal care': 55.76926,
 'Sports': 20.90907,
 'Attending events': 8.056627,
 'Seeing friends': 57.0152,
 'TV and Radio': 128.6927,
 'Other leisure activities': 153.5697}

We will later work with pandas that can deal with these "nested dictionaries" automatically