# User input and textfiles



## Contents
  
* User input (sys)
* User input (input)
* Useful string operations
* Reading data from files
* Writing data to file
* Errors/exceptions

## Command line arguments:

Command line arguments are "words" written after the program name when you run it, e.g. 
```bash
python3 hello_world.py 10
```
Here the command line argument is 10. If you are running from spyder on anaconda (or any iPyhon environment) command line arguments are provided by

```python
!python3 hello_world.py 10
```

## Recap of the sys.argv list:

sys is a python package and may be imported by
```python
import sys
```



### The first example (test_sys.py):


```python

import sys
print(f"In this program, {sys.argv[1]} is the command line argument")

```

In [None]:
!python3 scripts/lecture-2-test-sys.py 10

### Several command line arguments (test_sys2.py)

the "magic" sys.argv line refers to the list sys.argv which consists of all the words (separated by spaces) that are listed on the command line (including the program name)

```python
import sys

sys_argv_list = sys.argv
cmd_args = sys.argv[1:]

print("The sys.argv list looks like this: ", sys_argv_list)
print("But we are only interested in these arguments: ", cmd_args)
```

In [None]:
!python3 scripts/lecture-2-test-sys2.py 10 20 30 40

# argparse
`argparse` is a standard library in Python used for parsing command-line arguments. It makes it easy to write user-friendly command-line interfaces by defining the arguments that your program requires and automatically generates help and error messages.

Why use `argparse` instead of `sys.argv`?

* Provides a lot of flexibility to specify how command-line arguments should be parsed.
* Automatically generates help messages.

```python
import argparse

# Initialize the parser
parser = argparse.ArgumentParser(description="This is a simple example.")

# Add arguments
parser.add_argument("name", help="Your name")
parser.add_argument("-a", "--age", help="Your age", type=int, default=0)

# Parse the arguments
args = parser.parse_args()

# Use the arguments
print(f"Hello, {args.name}!")
if args.age:
    print(f"You are {args.age} years old.")
```

In [None]:
!python ./scripts/lecture-2-argparse.py --help

In [None]:
!python ./scripts/lecture-2-argparse.py vegard -a 33

```python
import sys
name = sys.argv[1]
if len(sys.argv)>2:
    age = sys.argv[2]
print(f"Hello {name}")
if len(sys.argv)>2:
    print(f"You are {age} years old")
```

# The input function
Another way of getting user information is the input function. The user does not need to provide command line arguments, but can reply to questions from the program: 

In [None]:
number = input('Write a number:')
print(f'Your number is {number}')

In [None]:
numbers = input("Write many numbers separated by spaces")
print(f"Your numbers are", numbers)

In [None]:
numbers = input("Write many numbers separated by spaces")
list_of_numbers = numbers.split()
e0 = float(list_of_numbers[0])
e1 = float(list_of_numbers[1])
print(e0+e1)

The final exapmle is quite similar to the result from sys.argv[1:]. I prefer sys or argparser, it is faster. 

## Useful string operations 

In [None]:
info = input("Write the current day, date and time separated by spaces")
infolist = info.split()
day = infolist[0]
date = infolist[1:-1]
time = infolist[-1]
print(f"Today is {day}. The date is {date}. The time is {time}")

In [None]:
info = input("Write the current day, date and time separated by commas")
infolist = info.split(',')
day = infolist[0]
date = infolist[1]
time = infolist[2]
print(f"Today is {day}. The date is {date}. The time is {time}")

## The join method
define the list
```python
date = ['09', 'September']
```
Can you extract the info from the list into one string saying only "09 September"

In [None]:
date = ['09', 'September']
' '.join(date)

## Adding strings

In [None]:
date[0]+' '+date[1]

# Reading from textfiles (.txt, .dat etc.)

To open a datafile in the same location as the current .py-file use the syntax open(filename) where filename is the name of the datafile as a string:
```python
infile = open('example_data.txt')
```
Here example_data may look like this: 
```bash
This is the first line of the file
This is the second line of the file
Below comes the interesting part of the file: 
10 20 30 
20 30 1
2.2 125 6.45
0.1 20 3.14
```

In [None]:
infile = open('data/example_data.txt')
infile

We can read the file line by line by using the method readline:

In [None]:
line1 = infile.readline()
line2 = infile.readline()
print(line1)
print(line2)

If we are not interested in the first (few) lines we can call infile.readline() a few times to skip those lines. 

In [None]:
infile = open('data/example_data.txt')
infile.readline()
infile.readline()
line3 = infile.readline()
print(line3)

The TextIOWrapper can be iterated over and starts at the current line in the datafile. We have already called infile.readline() three times since opening the last time, thus the first three lines are omitted in the for loop below:

In [None]:
for line in infile:
    print(line)

In [None]:
#full program to print the interesting lines: 
infile = open('data/example_data.txt')
infile.readline()
infile.readline()
infile.readline()
for line in infile:
    print(line)

# We can use readlines() to read all lines at once

In [None]:
infile = open('data/example_data.txt')
lines = infile.readlines()
print(lines)

Now assume that we wanted to store the numbers from the file in three lists/columns: c1, c2 and c3. In the end we should end up with: 
```python
c1 = [10, 20, 2.2, 0.1]
c2 = [20, 30, 125, 20]
c3 = [30, 1, 6.45, 3.14]
```

## Exercise:
1) Read the file "GRA4157/lectures/02-python-summary-2/data/example_data.txt" in python

2) Create empty lists c1, c2, and c3. Then iterate over the infile and add the first number in each line to c1, the second number in each line to c2 and the third number to c3. The type of objects in the lists should be float. 

In [None]:
filename = "data/example_data.txt"
infile = open(filename)
infile.readline()
infile.readline()
infile.readline()
c1 = []
c2 = []
c3 = []
for line in infile:
    numbers = line.split()
    c1.append(float(numbers[0]))
    c2.append(float(numbers[1]))
    c3.append(float(numbers[2]))

print(c1,c2,c3)

In [None]:
filename = "data/example_data.txt"
infile = open(filename)
lines = infile.readlines()
c1, c2, c3 = [[float(line.split()[col]) for line in lines if "file" not in line] for col in [0,1,2]]
print(c1, c2, c3)

## Useful string operations 2:

The methods startswith, in and endswith are useful string operations that may be used when reading files. 

example_data2.txt
```bash
This is a header
This is a header
Numbers: 1 2 3
Numbers: 2 3 4
5 6 7
```
1) We are only interested in the lines that starts with "Numbers":

In [None]:
infile = open('data/example_data2.txt')
for line in infile:
    if line.startswith('Number'):
        print(line)

2) We are only interested in the lines that does not end with "header"

In [None]:
infile = open('data/example_data2.txt')
for line in infile:
    if not line.endswith('header\n'):
        print(line)

In [None]:
infile = open('data/example_data2.txt')
for line in infile:
    if not line.strip().endswith('header'):
        print(line)

3) We are only interested in lines that has the number 2 in them

In [None]:
infile = open('data/example_data2.txt')
for line in infile:
    if '2' in line:
        print(line)
infile.close()

# Writing to file:

To write to file, we still use the open() function, but we have to specify that we want to write to file. 
```python
outfile = open('outfile.txt','w')
```
The mode, here "w", indicates that we want to write to file. The default value (when nothing is provided as in the previous examples) indicates that we want to read from file. Warning: If the file outfile.txt exists, everything will be overwritten by what we decide to write to the file. 

In [None]:
outfile = open('data/outfile.txt','w')
outfile.write('This is the first line of the file')
outfile.close()

In [None]:
outfile = open('data/outfile.txt','w')
outfile.write('The previous line is deleted and this is the new line')
outfile.close()

We can append to existing files using the 'a' mode when opening:

In [None]:
outfile = open('data/outfile.txt','a')
outfile.write('The previous text is still there, and this line was just appended')
outfile.close()

Use \n for newline

In [None]:
outfile = open('data/outfile.txt','a')      # use 'w' to write to new file 'a' to append to an already existing file
outfile.write("Now let's add a new line:\nThis is a new line")
outfile.close()

We perform all the operations we want on a file before closing it. In the previous example we closed the file after each operation to inspect changes while we wrote. 

## Exercise: write a table to file
Assume that we have 10 numbers in a python list: [1,2,3,4,5,6,7,8,9,10]. 
Use python to write a file that contains the numbers as a column and another column with the square root of the given number. The file should look like this: 
```bash
x sqrt(x)
1 1
2 1.41
...
```
You can decide on how many decimals and on which format the sqrt(x) should contain in the file. 

## Starting to work with bigger data sets
Example: How to people spend their time? 
<img src="figs/Timeuse.png" style="width: 90%; margin: auto;">



I have exported the data to a .txt file: 
```text
,Country,Category,Time (minutes)
0,Australia,Paid work,211.146629603892
1,Austria,Paid work,279.53226810278
2,Belgium,Paid work,194.476452188763
3,Canada,Paid work,268.660609647898
4,Denmark,Paid work,199.771595915566
...
```


Let us now assume that we are only interested in how much people in a given country work (Paid work). How would you extract this information? The file has 462 lines, so manual reading is not effective.






















In [None]:
infile = open('data/Time-use.txt')
for line in infile:
    if "Paid work" in line:
        print(line)

In [None]:
infile = open('data/Time-use.txt')
line0 = infile.readline()
line1 = infile.readline()
print(line1)
info = line1.split(',')
print(info)

Let's say we are now interested in a dictionary that contains the number of minutes of paid work per day in all countries.

We run over all lines in the file and find the lines that contains "Paid work". Then we extract the country and number of minutes and add to a dictionary.

In [None]:
infile = open('data/Time-use.txt')
work = {}
for line in infile:
    if "Paid work" in line:
        info = line.split(',')
        country = info[1].strip()
        minutes = info[-1]
        work[country] = round(float(minutes.strip()), 1)
print(work)

In [None]:
print(work['India'])
print(work['Norway'])

In [None]:
number = max(work.values())
print(number)

In [None]:
idx = list(work.values()).index(number)
print(idx)

In [None]:
list(work.keys())[idx]

### Exercise

Locate the file GRA4157/lectures/02-python-summary2/data/Time-use.txt.

1) Write a program that reads the file, and prints out all information about Norway. 

2) Write a program that reads the file, and prints out the information about other leisure activities time for all countries. 

3) Write a program that reads the file, and writes a new file sleep.txt, only consisting of the minutes of sleep per country. sleep.txt thus contains two columns, one column of countries and a corresponding column with minutes of sleep. The header should be "Country Sleep-minutes".

3) Write a program that computes a "happiness score" per country. The happiness score is computed via: hours_of_sleep + seeing_friends + other_leisure + 1.2\*education - 0.2*paid_work

# Errors and exceptions
We often want to convert data from files (strings) to floating point numbers:

```txt
This is the first line of the file
This is the second line of the file
Below comes the interesting part of the file: 
10 20 30 
20 30 1
2.2 125 6.45
0.1 20 3.14
```

In [None]:
infile = open('data/example_data.txt')
numbers = []
for line in infile:
    info = line.split()
    try:
        number = float(info[0])
        numbers.append(number)
    except:
        print('Skipping line: ', line)
    
print(numbers)
    

## Raise (throw) an exception
When working with input data, we often want the program to fail when wrong input is provided:

In [None]:
message = input('Write hello')
if message != 'hello':
    raise Exception('The input should be hello')

There are numerous exceptions in python:

In [None]:
number = float(input('Write a number between 0 and 10'))
if number <= 0 or number >= 10:
    raise ValueError('The number must be between 0 and 10')

# More on reading files

In [None]:
with open('data/example_data.txt') as infile:
    for line in infile:
        print(line)

Gather all information about the countries in a nested dictionary

In [None]:
all_data = {}
with open('data/Time-use.txt') as infile:
    headers = infile.readline().split()
    for line in infile:
        info = line.split(',')
        country = info[1].strip()
        if not country in all_data:
            all_data[country] = {}
        else:
            all_data[country][info[2]] = float(info[3])

            
all_data['Austria']

We will later work with pandas that can deal with these "nested dictionaries" automatically

For general information that can be nested in dictionaries yaml and json are often used. 

In [None]:
import yaml
with open("data/yaml_data.yaml") as file:
    data = yaml.load(file, Loader=yaml.Loader)
data

In [None]:
data.keys()

In [None]:
data["xmas-fifth-day"]

In [None]:
data["xmas-fifth-day"].keys()

The json library is probably the most common for working with dictionary data in python. It enables fast conversion between a python dict and a string

In [None]:
import json
with open("data/yaml_data.json") as file:
    data = file.read()
d = json.loads(data)
d

In [None]:
s = json.dumps(d)
s

In [None]:
type(s)

# Timing and efficiency

In real applications many scripts, processes and functions are interacting in a complex manner. If the end-to-end process of an application is slow, it is useful to measure execution time of each individual script/function/module. 

In [11]:
import time
start_time = time.time()
for i in range(4):
    time.sleep(1)
    print("calculating some difficult stuff here...")

T = time.time() - start_time
print(f"Time spent {T:.2f} seconds")

calculating some difficult stuff here...
calculating some difficult stuff here...
calculating some difficult stuff here...
calculating some difficult stuff here...
Time spent 4.00 seconds


In [12]:
import time
import datetime
start_time = datetime.datetime.now()
for i in range(4):
    time.sleep(1)
    print("calculating some difficult stuff here...")

T = datetime.datetime.now() - start_time
print(f"Time spent {T.seconds:.2f} seconds")

calculating some difficult stuff here...
calculating some difficult stuff here...
calculating some difficult stuff here...
calculating some difficult stuff here...
Time spent 4.00 seconds


In [13]:
def f0(x):
    for i in range(int(2e8)):
        x = x+0.0001
    return x

def f1(x):
    for i in range(int(1e3)):
        x = x + 0.1
    return x

In [16]:
import time
x = 1
t0 = time.time()
A = f0(x)
print(f"Time spent calling f0 {time.time()-t0:.2f} seconds")
t1 = time.time()
B = f1(x)
print(f"Time spent calling f1 {time.time()-t1:.2f} seconds")
print(A+B)

Time spent calling f0 6.22 seconds
Time spent calling f1 0.00 seconds
20101.999994261987
