# Read and Write a .txt File Task Sheet (solutions)

<img width=80 src="https://media.giphy.com/media/KAq5w47R9rmTuvWOWa/giphy.gif">

<img width=150 src="Images/Assembler.png">

# Important:

- Comment your code explaining what each part does when you consider it.
- We are asking about two types of problems. The first type are solved with code cells, and the second with markdown type cells (where the solution must be thought without executing code cells). This will be indicated in each problem.
- Please always respect the instructions. If you are asked to use higher order functions, use at least as many as requested.
- All files created or used must be contained in a folder called "Files" located in the directory where you place this Jupyter notebook.

# Recommendations:

- There are as many ways to solve a problem as there are people in the world. Find yours!
- Create as many variables as you want. They cost nothing and are worth it for the sake of clarity.
- You can add cells if needed. 
- Remember that there are two types of cells: code and markdown. Use both. Explanations never hurt.
- There are several ways to approach the same problem. Try not to repeat your way of thinking.
- If different syntaxes lead to the same result, explore them.
- Use internet in a smart way. Don't look at how to solve the problem in its entirety; it is better to learn methods that lead your logical reasoning to the solution.
- Once you have a plan to address the problem, try breaking your code into manageable chunks.
- Use `print()` and `type()` functions in the middle of your code to understand what your code is actually doing.

***

## Multiples of six inside a .txt

Create a program that writes to a file called "multiples_of_six" the first 51 multiples of six.

In [3]:
# Type the code here:
count = 1
number = 0

with open('Files/multiples_of_six.txt', 'w') as f_six:
    while count <= 51:
        if number % 6 == 0:
            f_six.write(str(number)+'-')
            count += 1
        number +=1

In [4]:
with open('Files/multiples_of_six.txt', 'r') as f:
    print(f.read())

0-6-12-18-24-30-36-42-48-54-60-66-72-78-84-90-96-102-108-114-120-126-132-138-144-150-156-162-168-174-180-186-192-198-204-210-216-222-228-234-240-246-252-258-264-270-276-282-288-294-300-


## Files manipultation


### El adivino de Jorge Luis Borges

- Download the 'Borges_el_adivino.txt' file, and drop it to the 'Files' folder

- Write a function called `head()` that receives a .txt file and an integer N, and prints the first N lines of the file.

- Write another function that prints the first character of each line.

In [107]:
# Type the code here:
def head(name, N):
    with open(name, "r", encoding='utf-8') as f:
        for _ in range(N):
            line = f.readline()
            print(line, end = '')

In [108]:
head("Files/Borges_el_adivino.txt", 15)

El adivino
[Minicuento. Texto completo.]
Jorge Luis Borges
En Sumatra, alguien quiere doctorarse de adivino.
El brujo examinador le pregunta si será reprobado o si pasará.
El candidato responde que será reprobado…
FIN


In [49]:
# Type the code here:
def first_letter(name):
    with open(name, "r", encoding='utf-8') as file:
        for line in file:
            print(line[0])

In [50]:
first_letter("Files/Borges_el_adivino.txt")

E
[
J
E
E
E
F




### Kafka, the segment

Write a function that opens a .txt file, processes it, and prints the number of lines, number of words, and number of characters in that file.

In [114]:
# Type the code here:
def count_things(name):
    with open(name, "r", encoding='utf-8') as file:
        lines = file.readlines()

    print(lines)
    line_counter = 0
    word_counter = 0
    character_counter = 0
    for line in lines:
        character_counter += len(line.rstrip('\n'))
        #print(f'{line} has {character_counter} additive characters')
        line = line.strip('\n[].,')
        line_counter += 1
        #print(line_counter,line)
        for word in line.split(' '):
            word_counter += 1
            #print(word_counter, word)
    print(f'''
    Total number of lines: {line_counter}.
    Total number of words: {word_counter}.
    Total number of characters: {character_counter}.
    ''')

In [115]:
count_things("Files/Kafka_segmento.txt")

['Un segmento\n', 'Franz Kafka, Diaries\n', 'Le fue cortado un segmento de la parte posterior de la cabeza.\n', 'Con el sol todo el mundo mira hacia dentro.\n', 'Esto lo pone nervioso, lo distrae del trabajo; además, le fastidia que, justamente él, deba ser excluido del espectáculo.']

    Total number of lines: 5.
    Total number of words: 45.
    Total number of characters: 257.
    


### Mixing writers

- Join both texts in a new file. Separate them with a blank line.


- Create a function that counts how many times each word is repeated and show the final count of all of them.
        - Punctuation marks are not part of the word.
        - A word is the same even if it appears in upper and lower case.

In [119]:
# Type the code here:
data1 = data2 = ""
  
with open('Files/Borges_el_adivino.txt') as fp:
    data1 = fp.read()
#print(data1)

with open('Files/Kafka_segmento.txt') as fp:
    data2 = fp.read()
#print(data2)

# Merging 2 files
data1 += "\n"
data1 += data2
  
with open ('Files/Mixing_writers.txt', 'w') as fp:
    fp.write(data1)

In [120]:
with open('Files/Mixing_writers.txt', 'r', encoding='utf-8') as f:
    print(f.read())

El adivino
[Minicuento. Texto completo.]
Jorge Luis Borges
En Sumatra, alguien quiere doctorarse de adivino.
El brujo examinador le pregunta si será reprobado o si pasará.
El candidato responde que será reprobado…
FIN

Un segmento
Franz Kafka, Diaries
Le fue cortado un segmento de la parte posterior de la cabeza.
Con el sol todo el mundo mira hacia dentro.
Esto lo pone nervioso, lo distrae del trabajo; además, le fastidia que, justamente él, deba ser excluido del espectáculo.


In [131]:
# Type the code here:

# Create an empty dictionary
d = dict()
with open("Files/Mixing_writers.txt", "r", encoding='utf-8') as text:

    # Loop through each line of the file
    for line in text:
        # Remove the leading spaces and newline character
        line = line.strip()
        # Split the line into words
        words = line.split(" ")
        # Strip of punctuation characters
        # Convert the characters in line to lowercase to avoid case mismatch
        words = [w.strip('.[]').lower() for w in words]

        # Iterate over each word in line
        for word in words:
            # Check if the word is already in dictionary
            if word in d:
                # Increment count of word by 1
                d[word] = d[word] + 1
            else:
                # Add the word to dictionary with count 1
                d[word] = 1

# Print the contents of dictionary
for key in list(d.keys()):
    print(key, ":", d[key])

el : 5
adivino : 2
minicuento : 1
texto : 1
completo : 1
jorge : 1
luis : 1
borges : 1
en : 1
sumatra, : 1
alguien : 1
quiere : 1
doctorarse : 1
de : 3
brujo : 1
examinador : 1
le : 3
pregunta : 1
si : 2
será : 2
reprobado : 1
o : 1
pasará : 1
candidato : 1
responde : 1
que : 1
reprobado… : 1
fin : 1
 : 1
un : 2
segmento : 2
franz : 1
kafka, : 1
diaries : 1
fue : 1
cortado : 1
la : 2
parte : 1
posterior : 1
cabeza : 1
con : 1
sol : 1
todo : 1
mundo : 1
mira : 1
hacia : 1
dentro : 1
esto : 1
lo : 2
pone : 1
nervioso, : 1
distrae : 1
del : 2
trabajo; : 1
además, : 1
fastidia : 1
que, : 1
justamente : 1
él, : 1
deba : 1
ser : 1
excluido : 1
espectáculo : 1


## Otitis patient information in nested dictionaries

Below you can find a dictionary with patients along with personal information (only the first surname of each patient was recorded, but all the names that appear on the DNI) and their diagnosis in Boolean format if they suffer from chronic otitis.

In [167]:
patients = {'Leopoldo Vidal-Ribas': {'age': 49, 'gender': 'male', 'otitis': True},
            'Clota Anastasia Sagnier': {'age': 17, 'gender': 'female', 'otitis': False},
            'Noah Costa': {'age': 32, 'gender': 'non-binary', 'otitis': False},
            'María Camila Elisa Valente': {'age': 67, 'gender': 'female', 'otitis': False},
            'Lucrecia Stanislawsky': {'age': 13, 'gender': 'female', 'otitis': True},
            'Omar Casimiro Recolons': {'age': 51, 'gender': 'male', 'otitis': False},
           }
patients

{'Leopoldo Vidal-Ribas': {'age': 49, 'gender': 'male', 'otitis': True},
 'Clota Anastasia Sagnier': {'age': 17, 'gender': 'female', 'otitis': False},
 'Noah Costa': {'age': 32, 'gender': 'non-binary', 'otitis': False},
 'María Camila Elisa Valente': {'age': 67,
  'gender': 'female',
  'otitis': False},
 'Lucrecia Stanislawsky': {'age': 13, 'gender': 'female', 'otitis': True},
 'Omar Casimiro Recolons': {'age': 51, 'gender': 'male', 'otitis': False}}

We want to deliver this information in a .txt file with a structure that can be used as a database/dataframe<a name="cite_ref-1"></a>[<sup>[1]</sup>](#cite_note-1). The required structure goes as follows:

- A first line of heder with the fields 'surname', 'age', 'gender' and 'otitis', separated by tabs. Except for the surname, generate the headers ('age', 'gender' and 'otitis') using the keys of the patients dictionary.


- The successive lines must contain the information of each patient in the same order as the header. Fields must always be separated by tabs on each line. Generate the entire content of the .txt using control flow statements; the algoritm must be automated for future check-ins.
  
  For the "otitis" field, it would be nice to replace the boolean `True`/`False` with `"yes"`/`"no"`. 
  
The file must be built as required.

<br>

<a name="cite_note-1"></a> [[1]](#cite_ref-1) You can research what databese and dataframes are.


In [168]:
# Type the code here:

with open('Files/patients.txt', 'w') as file_patients:

    # For header:
    header_list = ['surname']
    for header in list(patients[list(patients.keys())[0]].keys()):
        header_list.append(header)
        
    #print(header_list)
    
    # For rows (each patient):
    all_partient_list = []
    for k, v in patients.items():
        patient_sublist = [k.split()[-1]]
        for sub_k, sub_v in v.items():
            #print(sub_v)
            patient_sublist.append("yes" if sub_v == True else "no" if sub_v == False else str(sub_v))
        
        all_partient_list.append(patient_sublist)
        #print (k, patient_sublist, all_partient_list)
    
    #print(all_partient_list)
    
    # Write the file with the created lists:
    file_patients.write('\t'.join(header_list))
    file_patients.write('\n')
    for sublist in all_partient_list:
        file_patients.write('\t'.join(sublist))
        file_patients.write('\n')
        #print(sublist)

In [169]:
with open('Files/patients.txt', 'r') as f:
    print(f.read())

surname	age	gender	otitis
Vidal-Ribas	49	male	yes
Sagnier	17	female	no
Costa	32	non-binary	no
Valente	67	female	no
Stanislawsky	13	female	yes
Recolons	51	male	no



## File content multiplier function

See problem **11 Fibonacci? Who is Fibonacci?** from '*3.3-Control_flow_tasks*'.

Write a function called `file_content_multiplier` that takes the numerical content of a .txt file and returns the product of all the numbers it contains.

The structure of the .txt that the function must read is a sequence of numbers separated by hyphens: these are the numbers to multiply.

In addition to writing the function, which should be reusable for any file with this structure, write a test file to check that the function actually works. Do we use the Fibonacci sequence? So we can recycle already written code, buuuut with some modifications to learn something new ;)

- **Test file preparation:** write a program that creates a file called "Fibo" with the first 10 even terms of the Fibonacci sequence separated by hyphens (review the exercise if needed).

  Use at least one higer-order function where you deem suitable.

  0 is odd or even? You will have to leave it aside. Can you imagine why? 


- Write a function whose only parameter is the name of the file (without a directory or .txt format), which reads the file and returns the result of multiplying all the numbers it contains.

  This time, can you please use at least two higher-order functions inside the function `file_content_multiplier`?

<br>

**Do you want a little help?** 

The solution is `10241959016027943927469729382400` 

because

`2 * 8 * 34 * 144 * 610 * 2584 * 10946 * 46368 * 196418 * 832040 = 10241959016027943927469729382400`

In [141]:
# Type the code here:
# File preparation:

f0, f1 = 0, 1
count_odd = count = 1
list_for_string = list()

while count_odd <= 10:
    #print('Term',count, ' --> ', f0)

    fn = f1 + f0
    f0 = f1
    f1 = fn
    
    count += 1

    if f0 % 2 == 0:
        #print(count_odd,f0)
        count_odd += 1
        list_for_string.append(f0)

list_of_string = list(map(str, list_for_string))

string_of_strings = '-'.join(list_of_string)

print(string_of_strings, count, count_odd)

with open('Files/Fibo.txt', 'w') as file_Fibo:
    file_Fibo.write(string_of_strings)

2-8-34-144-610-2584-10946-46368-196418-832040 31 11


In [143]:
# Type the code here:
# file_content_multiplier function

from functools import reduce

def file_content_multiplier(file_name):
    '''
    Opens a file by its argument and compute the product of all numbers 
    inside it when they are separated by hyphens 
    '''
    with open('Files/'+file_name+'.txt','r') as file_Fibo:
        list_of_int = list(map(int,file_Fibo.readline().split('-')))
        
        return reduce((lambda x, y: x * y), list_of_int)

In [144]:
file_content_multiplier('Fibo')

10241959016027943927469729382400

## Extracting data from a database

Let us use the recently created patient .txt file to extract data of interest. Select adult patients (over 20 years of age) who are hearing healthy. Return a .txt file with that information.

Each patient must be stored on a new line and their features (surname, age, sex and otitis) must be separated by commas<a name="cite_ref-1"></a>[<sup>[1]</sup>](#cite_note-1).

Always encapsulate methods and procedures within functions. Create at least 4 functions: one to read the file, one to filter patients of interest, and one to write the .txt file. For the fourth mandatory function you have free will to do what you want. Remember that functions can call functions. Implement this. Also use as many arguments as you can in each function. In this way, when executing the functions, the original file is read and as a result, another file is returned with the selected patients with their features separated by commas. 

<br>

<a name="cite_note-1"></a> [[1]](#cite_ref-1) Comma separaated values is a usual structure for storing dataframes. Did you Google it?

In [216]:
# Type the code here:

def read_patients_file(file_name):
    '''
    Read the patients file. 
    Returns a list of dictionaries with the information of each patient in each.
    '''
    with open(file_name, "r") as f:
        # skip the first line (header) when reading in a list
        next(f)
        # list for appending patients' dictionaries
        patients_list = []
        for line in f:
            line = line.strip()
            surname, age, gender, otitis = line.split()
            age = int(age)
            # create a dictionary for each patient
            patients_dict = {'surname': surname,
                            'age': age,
                            'gender': gender,
                            'otitis': otitis}
            patients_list.append(patients_dict)
            
        return patients_list

In [244]:
def filter_patients(list_of_patients, min_age = 20, otitis = True):
    '''
    This function requires a list of patients. 
    It returns a new list with filtered patients.
    Default arguments are min_age = 20 which is the minimum age at which a patient is accepted, 
    and otitis = true which indicates what kind of patients are accepted.
    '''
    
    filtered_patients = []
    for patient in list_of_patients:
        if (patient['age'] >= 20 and patient['otitis'] == 'no'):
            filtered_patients.append(patient)
    return(filtered_patients)

In [277]:
def create_patient_line(patient_dictionary):
    '''
    This function converts a dictionary to a string.
    '''
    line_list = []
    for v in patient_dictionary.values():
        line_list.append(str(v))
    line = ','.join(line_list)

    return line

def write_filt_patients_file(list_of_patients):
    '''
    This function requires a list of patients. 
    Returns a .txt file with patients.
    '''
    list_of_lines = []
    for patient in list_of_patients:
        list_of_lines.append(create_patient_line(patient))
    
    with open ('Files/patients_filtered.txt', 'w') as fp:        
        fp.write('\n'.join(list_of_lines))

In [220]:
patients_list = read_patients_file("Files/patients.txt")

filter_patients = filter_patients(patients_list)

write_filt_patients_file(filter_patients)

In [279]:
with open('Files/patients_filtered.txt', 'r') as f:
    print(f.read())

Costa,32,non-binary,no
Valente,67,female,no
Recolons,51,male,no


## Cryptocurrency

**Text preparation:** Run the following cells:

**Contents:** Evolution of cryptos prices in 2022, 2021, 2020, 2019 and 2018 (in this order)

In [281]:
lines = ['Bitcoin: 47_345.22 - 29_374.15 - 7_200.17 - 3_843.52 - 15_599.20',
         'Ethereum: 3_829.57 - 975.51 - 134.17 - 155.05 - 772.64',
         'Tether: 1.0005 - 1.0006 - 0.9998 - 1.0172 - 1.0049',
         'BNB: 531.40 - 38.24 - 13.03 - 6.1886 - 8.4146',
         'XRP: 0.8591 - 0.2374 - 0.1944 - 0.3602 - 3.0487',
         'Monero: 250.21 - 138.06 - 45.75 - 49.82 - 412.06']

with open('Files/cryptocurrency.txt', 'w') as f:
    for line in lines:
        f.write(line)
        f.write('\n')

In [282]:
with open('Files/cryptocurrency.txt', 'r') as f:
    print(f.read())

Bitcoin: 47_345.22 - 29_374.15 - 7_200.17 - 3_843.52 - 15_599.20
Ethereum: 3_829.57 - 975.51 - 134.17 - 155.05 - 772.64
Tether: 1.0005 - 1.0006 - 0.9998 - 1.0172 - 1.0049
BNB: 531.40 - 38.24 - 13.03 - 6.1886 - 8.4146
XRP: 0.8591 - 0.2374 - 0.1944 - 0.3602 - 3.0487
Monero: 250.21 - 138.06 - 45.75 - 49.82 - 412.06



**Task:**

In the file called cryptocurrency.txt you can find the annual evolution of prices of 6 different cryptocurrencies between 2018 and 2022.

Write an algorithm that reads this file, calculates the average value of each cryptocurrency in the 5 recorded years, and compares this value with its price in 2018. As output, all cryptocurrencies should be classified in 3 different cases. Each case must be characterized by one of the following texts that must be printed on screen:

- The average price of .....cripto_name..... in the last 5 years fell more than 20% compared to its value in 2018.

- The average price of .....cripto_name..... in the last 5 years rose more than 20% compared to its value in 2018.

- The average price of .....cripto_name..... in the last 5 years remained more or less constant compared to its value in 2018.

depending on what has happened with each cryptocurrency.

In addition, the algorithm must create a file with the printed information.

**You must uase at least one higher-order function.** 

The one you like most!

<br/>

**Tip:** You can take advantage of [Regular Expression Operations](https://docs.python.org/3/library/re.html) to split a line by more than one character. See how to do it on Google. Be careful! The syntax is slightly different when you use split with this module.

In [284]:
# Type the code here:
import re

with open('Files/cryptocurrency.txt', 'r') as cripto_file:
    
# file separation of interest has different characters to be split by
    for cripto in cripto_file:
        cripto_list = re.split(': | - ', cripto.strip())
        name = cripto_list[0]
        value = list(map(float, cripto_list[1:]))
        mean_cripto = sum(value)/len(value)
        
#        print(name,mean_cripto, value[-1])
        if mean_cripto < value[-1]*0.8:
            print(f'The average price of {name} in the last 5 years fell more than 20% compared to its value in 2018.')
            
        elif mean_cripto > value[-1]*1.20:
            print(f'The average price of {name} in the last 5 years rose more than 20% compared to its value in 2018.')
        
        else:
            print(f'The average price of {name} in the last 5 years remained more or less constant compared to its value in 2018.')

The average price of Bitcoin in the last 5 years rose more than 20% compared to its value in 2018.
The average price of Ethereum in the last 5 years rose more than 20% compared to its value in 2018.
The average price of Tether in the last 5 years remained more or less constant compared to its value in 2018.
The average price of BNB in the last 5 years rose more than 20% compared to its value in 2018.
The average price of XRP in the last 5 years fell more than 20% compared to its value in 2018.
The average price of Monero in the last 5 years fell more than 20% compared to its value in 2018.
