En este archivo puedes escribir lo que estimes conveniente. Te recomendamos detallar tu solución y todas las suposiciones que estás considerando. Aquí puedes ejecutar las funciones que definiste en los otros archivos de la carpeta src, medir el tiempo, memoria, etc.

### Set up the file path to the dataset containing tweets related to the farmer protests.

In [1]:
file_path = "tweets.json/farmers-protest-tweets-2021-2-4.json"

Install required libraries


In [None]:
!pip install ujson
!pip install memory_profiler
!pip show ujson
!pip show memory_profiler

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: C:\Program Files\Python311\python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 24.1.2 -> 24.2
[notice] To update, run: C:\Program Files\Python311\python.exe -m pip install --upgrade pip


Name: ujson
Version: 5.8.0
Summary: Ultra fast JSON encoder and decoder for Python
Home-page: https://github.com/ultrajson/ultrajson
Author: Jonas Tarnstrom
Author-email: 
License: 
Location: C:\Users\allan\AppData\Roaming\Python\Python311\site-packages
Requires: 
Required-by: prefect
Name: memory-profiler
Version: 0.61.0
Summary: A module for monitoring memory usage of a python program
Home-page: https://github.com/pythonprofilers/memory_profiler
Author: Fabian Pedregosa
Author-email: f@bianp.net
License: BSD
Location: C:\Users\allan\AppData\Roaming\Python\Python311\site-packages
Requires: psutil
Required-by: 


In case you encounter an error with memory_profiler or/and ujson you can try uninstall them with: pip uninstall "name of the library" and then running the following. This approach worked for me when the kernel was giving me problems when running this imports

In [None]:
%pip install memory_profiler
%pip install ujson

## Measure the execution time and memory usage of the functions


In [None]:
import time
import timeit

def measure_execution_time(func, *args, **kwargs):
    """
    Measures the time a function takes to execute.

    Parameters:
        func (callable): The function to measure.
        *args: Positional arguments to pass to the function.
        **kwargs: Keyword arguments to pass to the function.

    Returns:
        result: The result of the function.
        exec_time: The execution time in seconds.
    """
    start_time = time.time()
    result = func(*args, **kwargs)
    end_time = time.time()
    
    exec_time = end_time - start_time
    return result, exec_time

#### For this cellblock we import q1_time and q1_memory from custom modules to measure the execution time and memory usage of these operations.
- Keep in mind that we are using different libraries to measure the execution time and memory usage: measure_execution_time, timeit and measure_memory.


In [None]:
from src.q1_time import q1_time
from src.q1_memory import q1_memory
from measure_memory import measure_memory

# performance-optimized

# ------ With measure execution time ------- #
result_time, exec_time_time = measure_execution_time(q1_time, file_path)
print(f"q1_time execution time[With own function]: {exec_time_time} seconds")
print(f"Result: {result_time}")
# ------ With timeit ------- #
execution_time = timeit.timeit(lambda: q1_time(file_path), number=1)
print(f"Execution time[With Timeit]: {execution_time} seconds")
# ------ With measure memory ------- #
result, mem_used = measure_memory(q1_time, file_path)
print(f"q1_memory result: {result}")
print(f"Memory used: {mem_used} MB")

# memory-optimized

# ------ With measure execution time ------- #
result_memory, exec_time_memory = measure_execution_time(q1_memory, file_path)
print(f"q1_memory execution time[With own function]: {exec_time_memory} seconds")
print(f"Result: {result_memory}")
# ------ With timeit ------- #
execution_time = timeit.timeit(lambda: q1_memory(file_path), number=1)
print(f"Execution time[With Timeit]: {execution_time} seconds")
# ------ With measure memory ------- #
result, mem_used = measure_memory(q1_memory, file_path)
print(f"q1_memory result: {result}")
print(f"Memory used: {mem_used} MB")

q1_time execution time[With own function]: 6.338999032974243 seconds
Result: [(datetime.date(2021, 2, 12), 'RanbirS00614606'), (datetime.date(2021, 2, 13), 'MaanDee08215437'), (datetime.date(2021, 2, 17), 'RaaJVinderkaur'), (datetime.date(2021, 2, 16), 'jot__b'), (datetime.date(2021, 2, 14), 'rebelpacifist'), (datetime.date(2021, 2, 18), 'neetuanjle_nitu'), (datetime.date(2021, 2, 15), 'jot__b'), (datetime.date(2021, 2, 20), 'MangalJ23056160'), (datetime.date(2021, 2, 23), 'Surrypuria'), (datetime.date(2021, 2, 19), 'Preetm91')]
Execution time[With Timeit]: 5.984733500000402 seconds
q1_memory result: [(datetime.date(2021, 2, 12), 'RanbirS00614606'), (datetime.date(2021, 2, 13), 'MaanDee08215437'), (datetime.date(2021, 2, 17), 'RaaJVinderkaur'), (datetime.date(2021, 2, 16), 'jot__b'), (datetime.date(2021, 2, 14), 'rebelpacifist'), (datetime.date(2021, 2, 18), 'neetuanjle_nitu'), (datetime.date(2021, 2, 15), 'jot__b'), (datetime.date(2021, 2, 20), 'MangalJ23056160'), (datetime.date(2021,

In this part, the timeit library is used as well as the other libraries and functions but timeit is a more standardized way to measure the time taken by the function.

#### Performance and Memory Optimization Explanation:
- Performance-Optimized Approach: This approach focuses on speeding up the operation, possibly by using parallelization or avoiding memory-intensive operations.

- Memory-Optimized Approach: Focuses on reducing the memory footprint, possibly by processing data in smaller chunks or using more memory-efficient data structures like shelve instead of keeping large datasets in memory.

#### Additional Considerations:
- timeit is useful because it provides an accurate and straightforward method to time functions by running them multiple times and averaging the result. It avoids the influence of background processes on time measurement.

- Our custom timing function provides flexibility by allowing us to handle other metrics or incorporate custom pre-processing or post-processing around the timing.

In [19]:
from src.q2_time import q2_time
from src.q2_memory import q2_memory

# performance-optimized
# ------ With measure execution time ------- #
result_time, exec_time_time = measure_execution_time(q2_time, file_path)
print(f"q1_time execution time[With own function]: {exec_time_time} seconds")
print(f"Result: {result_time}")
# ------ With timeit ------- #
execution_time = timeit.timeit(lambda: q2_time(file_path), number=1)
print(f"Execution time[With Timeit]: {execution_time} seconds")
# ------ With measure memory ------- #
result, mem_used = measure_memory(q2_time, file_path)
print(f"q1_memory result: {result}")
print(f"Memory used: {mem_used} MB")

# memory-optimized
# ------ With measure execution time ------- #
result_memory, exec_time_memory = measure_execution_time(q2_memory, file_path)
print(f"q1_memory execution time[With own function]: {exec_time_memory} seconds")
print(f"Result: {result_memory}")
# ------ With timeit ------- #
execution_time = timeit.timeit(lambda: q2_memory(file_path), number=1)
print(f"Execution time[With Timeit]: {execution_time} seconds")
# ------ With measure memory ------- #
result, mem_used = measure_memory(q2_memory, file_path)
print(f"q1_memory result: {result}")
print(f"Memory used: {mem_used} MB")

q1_time execution time[With own function]: 2.9854962825775146 seconds
Result: [('🙏', 2123), ('😂', 633), ('🌾', 605), ('💚', 533), ('👍', 471), ('👉', 450), ('🇮🇳', 426), ('🙏🙏', 403), ('👇', 390), ('🏽', 332)]
Execution time[With Timeit]: 3.028866999999991 seconds
q1_memory result: [('🙏', 2123), ('😂', 633), ('🌾', 605), ('💚', 533), ('👍', 471), ('👉', 450), ('🇮🇳', 426), ('🙏🙏', 403), ('👇', 390), ('🏽', 332)]
Memory used: 513.734375 MB
q1_memory execution time[With own function]: 4.357970952987671 seconds
Result: [('🙏', 2123), ('😂', 633), ('🌾', 605), ('💚', 533), ('👍', 471), ('👉', 450), ('🇮🇳', 426), ('🙏🙏', 403), ('👇', 390), ('🏽', 332)]
Execution time[With Timeit]: 4.298151499999221 seconds
q1_memory result: [('🙏', 2123), ('😂', 633), ('🌾', 605), ('💚', 533), ('👍', 471), ('👉', 450), ('🇮🇳', 426), ('🙏🙏', 403), ('👇', 390), ('🏽', 332)]
Memory used: 90.41015625 MB


In [None]:
from src.q3_time import q3_time
from src.q3_memory import q3_memory

# performance-optimized
# ------ With measure execution time ------- #
result_time, exec_time_time = measure_execution_time(q3_time, file_path)
print(f"q1_time execution time[With own function]: {exec_time_time} seconds")
print(f"Result: {result_time}")
# ------ With timeit ------- #
execution_time = timeit.timeit(lambda: q3_time(file_path), number=1)
print(f"Execution time[With Timeit]: {execution_time} seconds")
# ------ With measure memory ------- #
result, mem_used = measure_memory(q3_time, file_path)
print(f"q1_memory result: {result}")
print(f"Memory used: {mem_used} MB")

# memory-optimized
# ------ With measure execution time ------- #
result_memory, exec_time_memory = measure_execution_time(q3_memory, file_path)
print(f"q1_memory execution time[With own function]: {exec_time_memory} seconds")
print(f"Result: {result_memory}")
# ------ With timeit ------- #
execution_time = timeit.timeit(lambda: q3_memory(file_path), number=1)
print(f"Execution time[With Timeit]: {execution_time} seconds")
# ------ With measure memory ------- #
result, mem_used = measure_memory(q3_memory, file_path)
print(f"q1_memory result: {result}")
print(f"Memory used: {mem_used} MB")

q3_time execution time: 3.289499044418335 seconds
Result: [('narendramodi', 2265), ('Kisanektamorcha', 1840), ('RakeshTikaitBKU', 1644), ('PMOIndia', 1427), ('RahulGandhi', 1146), ('GretaThunberg', 1048), ('RaviSinghKA', 1019), ('rihanna', 986), ('UNHumanRights', 962), ('meenaharris', 926)]
Execution time[With Timeit]: 3.0875778999998147 seconds
Processed 1000 lines
Processed 2000 lines
Processed 3000 lines
Processed 4000 lines
Processed 5000 lines
Processed 6000 lines
Processed 7000 lines
Processed 8000 lines
Processed 9000 lines
Processed 10000 lines
Processed 11000 lines
Processed 12000 lines
Processed 13000 lines
Processed 14000 lines
Processed 15000 lines
Processed 16000 lines
Processed 17000 lines
Processed 18000 lines
Processed 19000 lines
Processed 20000 lines
Processed 21000 lines
Processed 22000 lines
Processed 23000 lines
Processed 24000 lines
Processed 25000 lines
Processed 26000 lines
Processed 27000 lines
Processed 28000 lines
Processed 29000 lines
Processed 30000 lines
P

#### Final Observations:
- When to use performance optimization? Performance-optimized versions are useful when we need to process a large volume of data quickly, and memory resources are not an issue.

- When to use memory optimization? Memory-optimized versions are better suited for environments where memory is limited or shared between multiple processes, even if it means sacrificing some execution speed.

#### Potential Improvements:

- Further optimization with pandas: While pandas can be more memory-heavy, careful use of pandas DataFrames can simplify operations and take advantage of built-in optimizations for certain tasks.

- Using orjson: This library is faster than ujson and could further reduce the time spent serializing and deserializing JSON data.

#### A couple of things to check or improve:

1. Memory Usage Measurement: The memory usage should work as expected since memory_profiler has been installed. But remember, it reports memory in MB, so make sure that any large results printed to the console don't overflow your memory readings.

2. Performance: Since we are working with large datasets, we might want to see if we can optimize q2_memory or any other functions to avoid performance bottlenecks. Consider profiling to identify which parts of the function use the most memory and time.

3. Jupyter Environment: Ensure that you restart the kernel if needed after installing the libraries, just to make sure everything is correctly recognized by the system.