# Manejo de archivos externos

Python tiene una función built-in llamada **open** que nos permite abrir archivo como .txt, .csv, .xlsx, ...

Sin embargo, es cierto, que según el tipo de archivo, python tiene módulos específicos para manejar más cómodamente cada archivo.

Ejemplos:
- csv para archivos csv (Comma separated values)
- json para archivos json (JavaScript object notation)

Daremos algunos ejemplos para que todo se entienda mejor.

## Acceso a la documentación

In [None]:
help(open)

## Lectura/Escritura clásica

### Lectura

In [None]:
filename = r'C:\Users\sergi\Documents\repos\python_course\data\Dummy.txt' # Es buena idea usar r delante de los paths explícitos cuando estamos en Windows.

txt_file = open(filename, mode = 'r')
content = txt_file.read()

txt_file = open(filename, mode = 'r')
content_first_line = txt_file.readline()

txt_file = open(filename, mode = 'r')
content_first_second_lines = txt_file.readlines(19) # Leemos hasta dónde se supera el nº de caracteres | hint=-1 lee todo

txt_file.close() # Cuando no se usa se cierra explícitamente

print(f'{content=} \n{content_first_line=} \n{content_first_second_lines=}')

### Lectura y Escritura

In [None]:
filename = r'C:\Users\sergi\Documents\repos\python_course\data\Dummy.txt' # Es buena idea usar r delante de los paths explícitos cuando estamos en Windows.

txt_file = open(filename, mode = 'r+') # Leer y escribir
txt_file.write(txt_file.read() + '\n')
txt_file.write('E' * 18 + '\n')
txt_file.close()

### Append (Si ya existe, escribe al final)

In [None]:
filename = r'C:\Users\sergi\Documents\repos\python_course\data\Dummy.txt' # Es buena idea usar r delante de los paths explícitos cuando estamos en Windows.

txt_file = open(filename, mode = 'a')
txt_file.write('F' * 18 + '\n')
txt_file.close()

## Estructura **with open**

In [None]:
filename = r'C:\Users\sergi\Documents\repos\python_course\data\Dummy.txt' # Es buena idea usar r delante de los paths explícitos cuando estamos en Windows.

with open(filename, 'r') as txt_file:
    content = txt_file.read()

print(content)

## Manejo con módulos específicos

### CSV

In [None]:
import csv

csv_path = r'C:\Users\sergi\Documents\repos\python_course\data\data.csv'

## READ
with open(csv_path) as csv_file:
    reader = csv.DictReader(csv_file, delimiter = ',')
    for row in reader:
        print(row)

## WRITE
with open(csv_path, 'a') as csv_file:
    spamwriter = csv.DictWriter(csv_file, delimiter = ',', fieldnames = list(row.keys()))
    spamwriter.writerow(row)

### JSON

In [None]:
import json

json_path = r'C:\Users\sergi\Documents\repos\python_course\data\config.json'

## READ
with open(json_path) as json_file:
    data = json.load(json_file)

print(data)

## WRITE
with open(json_path, 'w') as json_file:
    data['WRITE'] = None
    data = json.dump(data, json_file)

### YAML

In [None]:
import yaml

yaml_path = r'C:\Users\sergi\Documents\repos\python_course\data\config.yml'

## READ
with open(yaml_path, "r") as yml_file:
    try:
        data = yaml.safe_load(yml_file)
        print(data)
    except yaml.YAMLError as exc:
        print(exc)
    
## WRITE
with open(yaml_path, "a") as yml_file:
    try:
        data['WRITE'] = None
        yaml.dump(data, yml_file)
    except yaml.YAMLError as exc:
        print(exc)

### TIF

In [None]:
import rasterio
from pprint import pprint

tif_path = r'C:\Users\sergi\Documents\repos\python_course\data\rgb.tif'

## READ
with rasterio.open(tif_path, 'r') as src:
    data = src.read()
    profile = src.profile
    pprint(f'{profile=}')
    print(f'{data.shape=}')

## WRITE
with rasterio.open(tif_path, 'w', **profile) as dst:
    dst.write(data)

### Excel

In [30]:
import openpyxl

workbook = openpyxl.load_workbook(filename = '../data/Financial Sample.xlsx')

worksheet = workbook['Sheet1']

column_headers = []
for column in worksheet.iter_cols(min_row = 1, max_row = 1, values_only = True):
    column_headers.extend(column)

print('Header')
for index, header in enumerate(column_headers, start = 1):
    print(f"{header}", end = ' | ')
else:
    print(end = '\n')

print('\nContent')
for row in worksheet.iter_rows(min_row = 2, max_row = 4, values_only = True):
    for index, value in enumerate(row, start = 1):
        print(f"{value}", end = ' | ')
    else:
        print(end = '\n')

workbook.close()

Header
Segment | Country | Product | Discount Band | Units Sold | Manufacturing Price | Sale Price | Gross Sales | Discounts |  Sales | COGS | Profit | Date | Month Number | Month Name | Year | 

Content
Government | Canada | Carretera | None | 1618.5 | 3 | 20 | 32370 | 0 | 32370 | 16185 | 16185 | 2014-01-01 00:00:00 | 1 | January | 2014 | 
Government | Germany | Carretera | None | 1321 | 3 | 20 | 26420 | 0 | 26420 | 13210 | 13210 | 2014-01-01 00:00:00 | 1 | January | 2014 | 
Midmarket | France | Carretera | None | 2178 | 3 | 15 | 32670 | 0 | 32670 | 21780 | 10890 | 2014-06-01 00:00:00 | 6 | June | 2014 | 
