Taller 2.2 Curso: Fundamentos de Programación

* Docente: Daniel Escobar
* Estudiante: Danilo Rodriguez Arango 
* Tema: Concurrencia y Manipulación de Archivos

1. Multiprocessing

In [1]:
%run "./download_files.py"

Downloaded 5 files in 0.9916422367095947 seconds


2. Threading

In [2]:
import concurrent.futures
import requests
import threading
import time

csv_urls = [
    "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv",
    "https://raw.githubusercontent.com/plotly/datasets/master/iris.csv",
    "https://raw.githubusercontent.com/plotly/datasets/master/tips.csv",
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv",
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/flights.csv",
]

thread_local = threading.local()

def get_session():
    if not hasattr(thread_local, "session"):
        thread_local.session = requests.Session()
    return thread_local.session

def download_site(url):
    session = get_session()
    with session.get(url) as response:
        print(f"Read {len(response.content)} bytes from {url}")

def download_all_sites(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_site, sites)

def main():
    start_time = time.time()
    download_all_sites(csv_urls)
    duration = time.time() - start_time
    print(f"Downloaded {len(csv_urls)} files in {duration} seconds")

if __name__ == "__main__":
    main()


Read 2350 bytes from https://raw.githubusercontent.com/mwaskom/seaborn-data/master/flights.csv
Read 4601 bytes from https://raw.githubusercontent.com/plotly/datasets/master/iris.csv
Read 7943 bytes from https://raw.githubusercontent.com/plotly/datasets/master/tips.csv
Read 3393 bytes from https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv
Read 57018 bytes from https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv
Downloaded 5 files in 0.0705878734588623 seconds


3. Synchronous

In [3]:
#pip install nest_asyncio

import asyncio
import aiohttp
import time
import nest_asyncio

# Aplicar nest_asyncio
nest_asyncio.apply()

csv_urls = [
    "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv",
    "https://raw.githubusercontent.com/plotly/datasets/master/iris.csv",
    "https://raw.githubusercontent.com/plotly/datasets/master/tips.csv",
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv",
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/flights.csv",
]

async def download_site(url, session):
    async with session.get(url) as response:
        content = await response.read()
        print(f"Read {len(content)} bytes from {url}")

async def download_all_sites(sites):
    async with aiohttp.ClientSession() as session:
        tasks = [download_site(url, session) for url in sites]
        await asyncio.gather(*tasks)

def main():
    start_time = time.time()
    
    # Usamos asyncio.run() con nest_asyncio para evitar el error de bucle ya en ejecución
    asyncio.run(download_all_sites(csv_urls))
    
    duration = time.time() - start_time
    print(f"Downloaded {len(csv_urls)} files in {duration} seconds")

if __name__ == "__main__":
    main()


Read 57018 bytes from https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv
Read 3393 bytes from https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv
Read 2350 bytes from https://raw.githubusercontent.com/mwaskom/seaborn-data/master/flights.csv
Read 7943 bytes from https://raw.githubusercontent.com/plotly/datasets/master/tips.csv
Read 4601 bytes from https://raw.githubusercontent.com/plotly/datasets/master/iris.csv
Downloaded 5 files in 0.05017542839050293 seconds
