# Manejo de archivos externos

Python tiene una función built-in llamada **open** que nos permite abrir archivo como .txt, .csv, .xlsx, ...

Sin embargo, es cierto, que según el tipo de archivo, python tiene módulos específicos para manejar más cómodamente cada archivo.

Ejemplos:
- csv para archivos csv (Comma separated values)
- json para archivos json (JavaScript object notation)

Daremos algunos ejemplos para que todo se entienda mejor.

## Acceso a la documentación

In [1]:
help(open)

Help on function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise OSError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position).
    In

## Lectura/Escritura clásica

### Lectura

In [2]:
filename = r'C:\Users\sergi\Documents\repos\python_course\data\Dummy.txt' # Es buena idea usar r delante de los paths explícitos cuando estamos en Windows.

txt_file = open(filename, mode = 'r')
content = txt_file.read()

txt_file = open(filename, mode = 'r')
content_first_line = txt_file.readline()

txt_file = open(filename, mode = 'r')
content_first_second_lines = txt_file.readlines(19) # Leemos hasta dónde se supera el nº de caracteres | hint=-1 lee todo

txt_file.close() # Cuando no se usa se cierra explícitamente

print(f'{content=} \n{content_first_line=} \n{content_first_second_lines=}')

content='AAAAAAAAAAAAAAAAAA\nBBBBBBBBBBBBBBBBBB\nCCCCCCCCCCCCCCCCCC\nDDDDDDDDDDDDDDDDDD\n' 
content_first_line='AAAAAAAAAAAAAAAAAA\n' 
content_first_second_lines=['AAAAAAAAAAAAAAAAAA\n', 'BBBBBBBBBBBBBBBBBB\n']


### Lectura y Escritura

In [3]:
filename = r'C:\Users\sergi\Documents\repos\python_course\data\Dummy.txt' # Es buena idea usar r delante de los paths explícitos cuando estamos en Windows.

txt_file = open(filename, mode = 'r+') # Leer y escribir
txt_file.write(txt_file.read() + '\n')
txt_file.write('E' * 18 + '\n')
txt_file.close()

### Append (Si ya existe, escribe al final)

In [4]:
filename = r'C:\Users\sergi\Documents\repos\python_course\data\Dummy.txt' # Es buena idea usar r delante de los paths explícitos cuando estamos en Windows.

txt_file = open(filename, mode = 'a')
txt_file.write('F' * 18 + '\n')
txt_file.close()

## Estructura **with open**

In [5]:
filename = r'C:\Users\sergi\Documents\repos\python_course\data\Dummy.txt' # Es buena idea usar r delante de los paths explícitos cuando estamos en Windows.

with open(filename, 'r') as txt_file:
    content = txt_file.read()

print(content)

AAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDD
AAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDD

EEEEEEEEEEEEEEEEEE
FFFFFFFFFFFFFFFFFF



## Manejo con módulos específicos

### CSV

In [8]:
import csv

csv_path = r'C:\Users\sergi\Documents\repos\python_course\data\data.csv'

## READ
with open(csv_path) as csv_file:
    reader = csv.DictReader(csv_file, delimiter = ',')
    for row in reader:
        print(row)

## WRITE
with open(csv_path, 'a') as csv_file:
    spamwriter = csv.DictWriter(csv_file, delimiter = ',', fieldnames = list(row.keys()))
    spamwriter.writerow(row)

{'UUID': '65.tif', 'LONG': '36.47207866666667', 'LAT': '-6.249120222222222'}
{'UUID': '65.tif', 'LONG': '36.47207866666667', 'LAT': '-6.249120222222222'}
{'UUID': '65.tif', 'LONG': '36.47207866666667', 'LAT': '-6.249120222222222'}
{'UUID': '65.tif', 'LONG': '36.47207866666667', 'LAT': '-6.249120222222222'}
{'UUID': '65.tif', 'LONG': '36.47207866666667', 'LAT': '-6.249120222222222'}
{'UUID': '65.tif', 'LONG': '36.47207866666667', 'LAT': '-6.249120222222222'}


### JSON

In [7]:
import json

json_path = r'C:\Users\sergi\Documents\repos\python_course\data\config.json'

## READ
with open(json_path) as json_file:
    data = json.load(json_file)

print(data)

## WRITE
with open(json_path, 'w') as json_file:
    data['WRITE'] = None
    data = json.dump(data, json_file)

{'ok': True, 'fail': False}


### YAML

In [8]:
import yaml

yaml_path = r'C:\Users\sergi\Documents\repos\python_course\data\config.yml'

## READ
with open(yaml_path, "r") as yml_file:
    try:
        data = yaml.safe_load(yml_file)
        print(data)
    except yaml.YAMLError as exc:
        print(exc)
    
## WRITE
with open(yaml_path, "a") as yml_file:
    try:
        data['WRITE'] = None
        yaml.dump(data, yml_file)
    except yaml.YAMLError as exc:
        print(exc)

{'n_processed': 1, 'logs': {0: 'OK', 1: 'OK', 2: 'OK', 3: 'KO', 4: 'OK', 5: 'OK', 6: 'KO', 7: 'OK', 8: 'KO', 9: 'OK'}}


### TIF

In [9]:
import rasterio
from pprint import pprint

tif_path = r'C:\Users\sergi\Documents\repos\python_course\data\rgb.tif'

## READ
with rasterio.open(tif_path, 'r') as src:
    data = src.read()
    profile = src.profile
    pprint(f'{profile=}')
    print(f'{data.shape=}')

## WRITE
with rasterio.open(tif_path, 'w', **profile) as dst:
    dst.write(data)

("profile={'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 433, "
 "'height': 578, 'count': 4, 'crs': None, 'transform': Affine(1.0, 0.0, 0.0,\n"
 "       0.0, 1.0, 0.0), 'blockysize': 7, 'tiled': False, 'compress': 'lzw', "
 "'interleave': 'pixel'}")
data.shape=(4, 578, 433)


  dataset = writer(


### Excel

In [10]:
import openpyxl

workbook = openpyxl.load_workbook(filename = '../data/Financial Sample.xlsx')

worksheet = workbook['Sheet1']

column_headers = []
for column in worksheet.iter_cols(min_row = 1, max_row = 1, values_only = True):
    column_headers.extend(column)

print('Header')
for index, header in enumerate(column_headers, start = 1):
    print(f"{header}", end = ' | ')
else:
    print(end = '\n')

print('\nContent')
for row in worksheet.iter_rows(min_row = 2, max_row = 4, values_only = True):
    for index, value in enumerate(row, start = 1):
        print(f"{value}", end = ' | ')
    else:
        print(end = '\n')

workbook.close()

Header
Segment | Country | Product | Discount Band | Units Sold | Manufacturing Price | Sale Price | Gross Sales | Discounts |  Sales | COGS | Profit | Date | Month Number | Month Name | Year | 

Content
Government | Canada | Carretera | None | 1618.5 | 3 | 20 | 32370 | 0 | 32370 | 16185 | 16185 | 2014-01-01 00:00:00 | 1 | January | 2014 | 
Government | Germany | Carretera | None | 1321 | 3 | 20 | 26420 | 0 | 26420 | 13210 | 13210 | 2014-01-01 00:00:00 | 1 | January | 2014 | 
Midmarket | France | Carretera | None | 2178 | 3 | 15 | 32670 | 0 | 32670 | 21780 | 10890 | 2014-06-01 00:00:00 | 6 | June | 2014 | 
