# Home assignments (Mykola)

### Requirements: 
- Python==3.11.5
- PySpark==3.5.0

## Python

### 1, object store path min/max analysis
You have a specific prefix + key structure in your objects store (can be S3, HDFS, ...), that looks like this:
`protocol://bucket/base_path/specific_path/keys`  and a key has a structure of `id=some_value/month=yyyy-MM-dd/object{1, 2, 3, ...}`

Example:

s3://my-bucket/xxx/yyy/zzz/abc/id=123/month=2019-01-01/2019-01-19T10:31:18.818Z.gz

s3://my-bucket/xxx/yyy/zzz/abc/id=123/month=2019-02-01/2019-02-19T10:32:18.818Z.gz

s3://my-bucket/xxx/yyy/zzz/abc/id=333/month=2019-03-01/2019-06-19T10:33:18.818Z.gz

s3://my-bucket/xxx/yyy/zzz/def/id=123/month=2019-10-01/2019-10-19T10:34:18.818Z.gz

s3://my-bucket/xxx/yyy/zzz/def/id=333/month=2019-11-01/2019-12-19T10:35:18.818Z.gz

You have a function `get_all_keys(bucket, full_path) -> Iterator[str]` for getting all the keys for a full path (base_path + specific_path).

Notes:
On the input you know your bucket, base_path and all the specific paths you want to generate output for.
Also as shown in the example the month subkey has format of a date, but it's always yyyy-MM-01, so effectively it only gives you information about the year and month. Objects (files) within this structure have a timestamp, but this is a timestamp of when they have been created. For illustration, the last line in the example is an object (file) that was generated at '2019-12-19T10:35:18.818Z', but data in it are for the id of '333' and month of 2019-11.

**For each specific_path (there can be many):**
 - A, calculate for each id a minimum and maximum month (there cannot be gaps between moths)
 - B, write the output to a json file
 - C, there can be gaps between months (missing months), so report them also in some appropriate structure


In [1]:
from datetime import datetime, date, timedelta
from random import randrange, randint, choice
from typing import List
from pathlib import Path

import string
import shutil
import json
import pprint

### Generate Data

To avoid using real S3, I decided to use a file system for the simulation.

- Each directory path will emulate some S3 prefix: bucket_path/base_path/specific_path.
- Inside each final directory there is a file with a list of keys: bucket_path/base_path/specific_path/keys

In [2]:
YEAR_RANGE = (2019, date.today().year)


def get_random_range(start: int, end: int) -> List[int]:
    return sorted([randint(start, end), randint(start, end)])


def get_datetime_range():
    start_year, end_year = get_random_range(*YEAR_RANGE)
    start_month, end_month = get_random_range(1, 12)
    return datetime(start_year, start_month, 1), datetime(end_year, end_month, 1)


def get_random_datetime(start, end):
    delta_days = (end - start).days
    #return start + timedelta(days=randrange(delta_days + 1), seconds=randrange(86400))
    return start + timedelta(days=randrange(delta_days + 1), seconds=datetime.now().second)


def generate_path(depth=3, path=Path()):
    if depth == 0:
        return path
    path /= choice(string.ascii_lowercase) * 3
    return generate_path(depth - 1, path)

In [3]:
def generate_keys(bucket, full_path, id_count=100, max_size=1000):
    file_path = bucket / full_path / 'keys'
    
    with file_path.open('a+') as f:
        for _ in range(id_count):
            size = randint(1, max_size)
            
            _id = f"{randrange(1, 10**3):03}"
            start_date, end_date = get_datetime_range()
            delta_days = (end_date - start_date).days
            
            for _ in range(size):
                random_datetime = get_random_datetime(start_date, end_date)
                month = random_datetime.strftime("%Y-%m-01")
                file_name = random_datetime.strftime("%Y-%m-%dT%H:%M:%S.%fZ")
                f.write(f"id={_id}/month={month}/{file_name}.gz\n")
    
    
def generate_data(bucket_name, count_base_path=5, count_specific_path=10):
    input_data = []
    
    bucket_path = Path(bucket_name)
    
    if bucket_path.exists() and bucket_path.is_dir():
        shutil.rmtree(bucket_path)
    
    for _ in range(count_base_path):
        base_path = generate_path()
        for _ in range(count_specific_path):
            specific_path = generate_path(depth=1)
            full_path = base_path / specific_path
            Path(bucket_path / full_path).mkdir(parents=True, exist_ok=True)
            
            generate_keys(bucket_path, full_path)
            
            input_data.append((bucket_path, base_path, specific_path))
            
    return input_data
            

input_data = generate_data('my-bucket', count_base_path=3, count_specific_path=30)

### Solution

In [4]:
def get_all_keys(bucket, full_path):
    with (bucket / full_path / 'keys').open() as f:
        for line in f:
            yield line.strip()


def calc_month_range(month_min_obj, month_max_obj):
    start_year, min_month = month_min_obj
    end_year, max_month = month_max_obj
    
    month_range = set()
    
    for year in range(start_year, end_year + 1):
        
        start_month = min_month if year == start_year else 1
        end_month = max_month if year == end_year else 12
        
        for month in range(start_month, end_month + 1):
            month_range.add((year, month))
        
    return month_range


def calc_min_max(bucket_path, base_path, specific_path):
    
    result = {}
    
    for key in get_all_keys(bucket_path, base_path / specific_path):
        # maybe I should use regex and do some validation, I decided to skip
        id_data, month_data, _ = key.split('/')
        _id = int(id_data.split('id=')[-1])
        year, month, _ = month_data.split('month=')[-1].split('-')
        
        # several options here: 
        # 1) We can use some libraries to parse dates and increment by month
        # 2) Work with integers in python
        _id = int(_id)
        month_obj = (int(year), int(month))
        #_id, year, month = int(_id), int(year), int(month)
        
        if _id not in result:
            result[_id] = {
                'month_min': month_obj,
                'month_max': month_obj,
                'month_gaps': set(month_obj)
            }
        else:
            if result[_id]['month_min'] > month_obj:
                result[_id]['month_min'] = month_obj
                
            if result[_id]['month_max'] < month_obj:
                result[_id]['month_max'] = month_obj
            
            #result[_id]['month_min'] = min(result[_id]['month_min'], (year, month))
            #result[_id]['month_max'] = max(result[_id]['month_max'], (year, month))
            result[_id]['month_gaps'].add(month_obj)
                
        
    for _id in result:
        month_min = result[_id]['month_min']
        month_max = result[_id]['month_max']
        month_range = calc_month_range(month_min, month_max)
        result[_id]['month_gaps'] = month_range - result[_id]['month_gaps']
        
        template = "{}-{:02d}-01"
        
        result[_id]['month_min'] = template.format(*month_min)
        result[_id]['month_max'] = template.format(*month_max)
        result[_id]['month_gaps'] = [template.format(*month) for month in result[_id]['month_gaps']]
        
        
        
    with open(bucket_path / base_path / specific_path / 'output.json', 'w') as f:
        json.dump(obj=result, fp=f,indent=4)
    
    return result

In [5]:
for bucket_path, base_path, specific_path in input_data:
    result = calc_min_max(bucket_path, base_path, specific_path)
    pprint.pprint(result)

{20: {'month_gaps': [], 'month_max': '2019-08-01', 'month_min': '2019-03-01'},
 29: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2019-06-01'},
 39: {'month_gaps': [], 'month_max': '2023-01-01', 'month_min': '2020-01-01'},
 48: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2023-04-01'},
 56: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2021-08-01'},
 57: {'month_gaps': ['2020-03-01',
                     '2020-09-01',
                     '2022-06-01',
                     '2020-06-01',
                     '2020-12-01',
                     '2022-03-01',
                     '2021-04-01',
                     '2021-01-01',
                     '2021-07-01',
                     '2021-10-01',
                     '2020-02-01',
                     '2020-05-01',
                     '2020-11-01',
                     '2022-02-01',
                     '2020-08-01',
                     '2022-05-01',
                     '2021-03-01',
          

{2: {'month_gaps': ['2021-07-01'],
     'month_max': '2021-07-01',
     'month_min': '2021-05-01'},
 5: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2020-05-01'},
 8: {'month_gaps': ['2022-05-01'],
     'month_max': '2023-07-01',
     'month_min': '2020-07-01'},
 11: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2022-06-01',
                     '2020-06-01',
                     '2021-06-01',
                     '2021-11-01',
                     '2021-08-01',
                     '2019-05-01',
                     '2021-07-01',
                     '2022-04-01',
                     '2019-08-01'],
      'month_max': '2022-10-01',
      'month_min': '2019-03-01'},
 14: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-05-01'},
 15: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2021-01-01'},
 26: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2020-05-01'},
 34: {'month_gaps': [], 'month_m

{3: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2021-05-01'},
 13: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2020-07-01'},
 24: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-01-01'},
 33: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2021-04-01'},
 35: {'month_gaps': [], 'month_max': '2021-09-01', 'month_min': '2020-01-01'},
 42: {'month_gaps': ['2022-06-01',
                     '2022-03-01',
                     '2022-09-01',
                     '2022-12-01',
                     '2021-07-01',
                     '2023-04-01',
                     '2021-10-01',
                     '2023-01-01',
                     '2023-07-01',
                     '2023-10-01',
                     '2022-02-01',
                     '2022-05-01',
                     '2022-11-01',
                     '2022-08-01',
                     '2021-09-01',
                     '2023-06-01',
                     '2021-12-01',
           

{2: {'month_gaps': ['2021-07-01'],
     'month_max': '2021-07-01',
     'month_min': '2021-05-01'},
 5: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2020-05-01'},
 8: {'month_gaps': ['2022-05-01'],
     'month_max': '2023-07-01',
     'month_min': '2020-07-01'},
 11: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2022-06-01',
                     '2020-06-01',
                     '2021-06-01',
                     '2021-11-01',
                     '2021-08-01',
                     '2019-05-01',
                     '2021-07-01',
                     '2022-04-01',
                     '2019-08-01'],
      'month_max': '2022-10-01',
      'month_min': '2019-03-01'},
 14: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-05-01'},
 15: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2021-01-01'},
 26: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2020-05-01'},
 34: {'month_gaps': [], 'month_m

{2: {'month_gaps': ['2021-07-01'],
     'month_max': '2021-07-01',
     'month_min': '2021-05-01'},
 5: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2020-05-01'},
 8: {'month_gaps': ['2022-05-01'],
     'month_max': '2023-07-01',
     'month_min': '2020-07-01'},
 11: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2022-06-01',
                     '2020-06-01',
                     '2021-06-01',
                     '2021-11-01',
                     '2021-08-01',
                     '2019-05-01',
                     '2021-07-01',
                     '2022-04-01',
                     '2019-08-01'],
      'month_max': '2022-10-01',
      'month_min': '2019-03-01'},
 14: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-05-01'},
 15: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2021-01-01'},
 26: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2020-05-01'},
 34: {'month_gaps': [], 'month_m

{4: {'month_gaps': [], 'month_max': '2023-01-01', 'month_min': '2022-01-01'},
 8: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2019-04-01'},
 9: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2021-05-01'},
 10: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2021-10-01'},
 11: {'month_gaps': [], 'month_max': '2022-05-01', 'month_min': '2021-03-01'},
 18: {'month_gaps': [], 'month_max': '2022-06-01', 'month_min': '2019-04-01'},
 23: {'month_gaps': [], 'month_max': '2022-05-01', 'month_min': '2022-03-01'},
 24: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-11-01'},
 27: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-01-01'},
 31: {'month_gaps': ['2023-01-01',
                     '2022-10-01',
                     '2022-03-01',
                     '2022-09-01',
                     '2021-06-01',
                     '2021-12-01',
                     '2022-02-01',
                     '2021-05-01',
            

{4: {'month_gaps': [], 'month_max': '2023-01-01', 'month_min': '2022-01-01'},
 8: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2019-04-01'},
 9: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2021-05-01'},
 10: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2021-10-01'},
 11: {'month_gaps': [], 'month_max': '2022-05-01', 'month_min': '2021-03-01'},
 18: {'month_gaps': [], 'month_max': '2022-06-01', 'month_min': '2019-04-01'},
 23: {'month_gaps': [], 'month_max': '2022-05-01', 'month_min': '2022-03-01'},
 24: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-11-01'},
 27: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-01-01'},
 31: {'month_gaps': ['2023-01-01',
                     '2022-10-01',
                     '2022-03-01',
                     '2022-09-01',
                     '2021-06-01',
                     '2021-12-01',
                     '2022-02-01',
                     '2021-05-01',
            

{2: {'month_gaps': ['2021-07-01'],
     'month_max': '2021-07-01',
     'month_min': '2021-05-01'},
 5: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2020-05-01'},
 8: {'month_gaps': ['2022-05-01'],
     'month_max': '2023-07-01',
     'month_min': '2020-07-01'},
 11: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2022-06-01',
                     '2020-06-01',
                     '2021-06-01',
                     '2021-11-01',
                     '2021-08-01',
                     '2019-05-01',
                     '2021-07-01',
                     '2022-04-01',
                     '2019-08-01'],
      'month_max': '2022-10-01',
      'month_min': '2019-03-01'},
 14: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-05-01'},
 15: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2021-01-01'},
 26: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2020-05-01'},
 34: {'month_gaps': [], 'month_m

{2: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2019-09-01'},
 17: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2023-08-01'},
 23: {'month_gaps': [], 'month_max': '2019-12-01', 'month_min': '2019-01-01'},
 27: {'month_gaps': ['2022-03-01', '2021-12-01', '2022-01-01', '2022-02-01'],
      'month_max': '2023-08-01',
      'month_min': '2021-02-01'},
 44: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2022-02-01'},
 78: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2020-03-01'},
 91: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2022-03-01'},
 109: {'month_gaps': [], 'month_max': '2020-07-01', 'month_min': '2019-07-01'},
 111: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2019-03-01'},
 126: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-11-01'},
 128: {'month_gaps': [], 'month_max': '2021-03-01', 'month_min': '2020-03-01'},
 129: {'month_gaps': ['2020-09-01',
                      '202

{1: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2019-01-01'},
 4: {'month_gaps': ['2022-06-01',
                    '2022-03-01',
                    '2022-12-01',
                    '2021-07-01',
                    '2023-04-01',
                    '2021-10-01',
                    '2023-01-01',
                    '2023-07-01',
                    '2023-10-01',
                    '2022-05-01',
                    '2022-08-01',
                    '2021-09-01',
                    '2023-06-01',
                    '2021-06-01',
                    '2021-12-01',
                    '2022-04-01',
                    '2022-01-01',
                    '2022-10-01',
                    '2021-08-01'],
     'month_max': '2023-11-01',
     'month_min': '2021-05-01'},
 9: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2021-06-01'},
 14: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-07-01'},
 15: {'month_gaps': [], 'month_max': '2020-09-01', 'm

{7: {'month_gaps': [], 'month_max': '2021-09-01', 'month_min': '2021-02-01'},
 26: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2021-06-01'},
 43: {'month_gaps': [], 'month_max': '2021-09-01', 'month_min': '2020-09-01'},
 50: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-02-01'},
 63: {'month_gaps': [], 'month_max': '2021-02-01', 'month_min': '2020-01-01'},
 84: {'month_gaps': [], 'month_max': '2022-01-01', 'month_min': '2021-01-01'},
 89: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2019-08-01'},
 104: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2022-03-01'},
 113: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2019-02-01'},
 119: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2021-06-01'},
 122: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-07-01'},
 125: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2019-11-01'},
 130: {'month_gaps': [], 'month_max': '2023-11-0

{4: {'month_gaps': [], 'month_max': '2023-01-01', 'month_min': '2022-01-01'},
 8: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2019-04-01'},
 9: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2021-05-01'},
 10: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2021-10-01'},
 11: {'month_gaps': [], 'month_max': '2022-05-01', 'month_min': '2021-03-01'},
 18: {'month_gaps': [], 'month_max': '2022-06-01', 'month_min': '2019-04-01'},
 23: {'month_gaps': [], 'month_max': '2022-05-01', 'month_min': '2022-03-01'},
 24: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-11-01'},
 27: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-01-01'},
 31: {'month_gaps': ['2023-01-01',
                     '2022-10-01',
                     '2022-03-01',
                     '2022-09-01',
                     '2021-06-01',
                     '2021-12-01',
                     '2022-02-01',
                     '2021-05-01',
            

{3: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2021-05-01'},
 13: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2020-07-01'},
 24: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-01-01'},
 33: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2021-04-01'},
 35: {'month_gaps': [], 'month_max': '2021-09-01', 'month_min': '2020-01-01'},
 42: {'month_gaps': ['2022-06-01',
                     '2022-03-01',
                     '2022-09-01',
                     '2022-12-01',
                     '2021-07-01',
                     '2023-04-01',
                     '2021-10-01',
                     '2023-01-01',
                     '2023-07-01',
                     '2023-10-01',
                     '2022-02-01',
                     '2022-05-01',
                     '2022-11-01',
                     '2022-08-01',
                     '2021-09-01',
                     '2023-06-01',
                     '2021-12-01',
           

{21: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2019-07-01'},
 40: {'month_gaps': ['2021-03-01',
                     '2020-12-01',
                     '2021-06-01',
                     '2021-02-01',
                     '2020-11-01',
                     '2021-05-01',
                     '2021-04-01',
                     '2021-01-01'],
      'month_max': '2022-07-01',
      'month_min': '2019-10-01'},
 41: {'month_gaps': [], 'month_max': '2019-04-01', 'month_min': '2019-01-01'},
 44: {'month_gaps': ['2023-01-01',
                     '2022-03-01',
                     '2023-06-01',
                     '2021-06-01',
                     '2023-03-01',
                     '2020-10-01',
                     '2021-02-01',
                     '2022-02-01',
                     '2021-05-01',
                     '2021-11-01',
                     '2022-05-01',
                     '2022-11-01',
                     '2023-02-01',
                     '2021-08-01',
    

{1: {'month_gaps': [], 'month_max': '2020-11-01', 'month_min': '2019-02-01'},
 9: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2019-07-01'},
 18: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2020-06-01'},
 23: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2020-03-01'},
 29: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-12-01'},
 30: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2022-07-01'},
 31: {'month_gaps': [], 'month_max': '2022-06-01', 'month_min': '2021-02-01'},
 33: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2020-01-01'},
 40: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2021-10-01'},
 45: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2020-12-01'},
 46: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2020-02-01'},
 52: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2021-09-01'},
 85: {'month_gaps': [], 'month_max': '2023-11-01', 'mo

{2: {'month_gaps': ['2021-07-01'],
     'month_max': '2021-07-01',
     'month_min': '2021-05-01'},
 5: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2020-05-01'},
 8: {'month_gaps': ['2022-05-01'],
     'month_max': '2023-07-01',
     'month_min': '2020-07-01'},
 11: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2022-06-01',
                     '2020-06-01',
                     '2021-06-01',
                     '2021-11-01',
                     '2021-08-01',
                     '2019-05-01',
                     '2021-07-01',
                     '2022-04-01',
                     '2019-08-01'],
      'month_max': '2022-10-01',
      'month_min': '2019-03-01'},
 14: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-05-01'},
 15: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2021-01-01'},
 26: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2020-05-01'},
 34: {'month_gaps': [], 'month_m

{10: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2021-04-01'},
 13: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2021-06-01'},
 14: {'month_gaps': ['2023-02-01', '2019-11-01', '2022-11-01', '2021-02-01'],
      'month_max': '2023-05-01',
      'month_min': '2019-01-01'},
 24: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-05-01'},
 27: {'month_gaps': [], 'month_max': '2022-04-01', 'month_min': '2022-01-01'},
 35: {'month_gaps': ['2021-10-01',
                     '2021-03-01',
                     '2020-06-01',
                     '2020-12-01',
                     '2022-09-01',
                     '2021-09-01',
                     '2021-06-01',
                     '2022-02-01',
                     '2020-08-01',
                     '2022-05-01',
                     '2019-12-01',
                     '2021-11-01',
                     '2022-08-01',
                     '2020-01-01',
                     '2020-07-01',
              

{2: {'month_gaps': [], 'month_max': '2021-05-01', 'month_min': '2021-03-01'},
 7: {'month_gaps': [], 'month_max': '2020-06-01', 'month_min': '2019-01-01'},
 10: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2022-02-01'},
 14: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2019-01-01'},
 26: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2019-05-01'},
 30: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2020-02-01'},
 45: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2021-04-01'},
 49: {'month_gaps': [], 'month_max': '2020-09-01', 'month_min': '2019-07-01'},
 53: {'month_gaps': ['2023-02-01'],
      'month_max': '2023-07-01',
      'month_min': '2021-02-01'},
 62: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2020-03-01'},
 71: {'month_gaps': [], 'month_max': '2022-06-01', 'month_min': '2019-03-01'},
 73: {'month_gaps': ['2020-03-01',
                     '2020-02-01',
                     '2019-12-01',
    

{1: {'month_gaps': [], 'month_max': '2022-02-01', 'month_min': '2020-01-01'},
 3: {'month_gaps': [], 'month_max': '2022-04-01', 'month_min': '2019-03-01'},
 6: {'month_gaps': [], 'month_max': '2019-03-01', 'month_min': '2019-01-01'},
 8: {'month_gaps': ['2023-03-01'],
     'month_max': '2023-04-01',
     'month_min': '2021-02-01'},
 12: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-01-01'},
 23: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2020-01-01'},
 30: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2021-05-01'},
 31: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2020-04-01'},
 32: {'month_gaps': [], 'month_max': '2019-09-01', 'month_min': '2019-02-01'},
 33: {'month_gaps': [], 'month_max': '2019-11-01', 'month_min': '2019-05-01'},
 34: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2021-03-01'},
 37: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2022-03-01'},
 39: {'month_gaps': [], 'month_max

{1: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2019-08-01'},
 3: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2020-04-01'},
 6: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2020-05-01'},
 21: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2022-06-01'},
 23: {'month_gaps': [], 'month_max': '2021-03-01', 'month_min': '2019-01-01'},
 25: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-10-01'},
 33: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2019-05-01'},
 34: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2019-06-01'},
 36: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2021-07-01'},
 40: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2019-10-01',
                     '2020-02-01',
                     '2020-05-01',
                     '2019-09-01',
                     '2019-12-01',
                     '2020-04-01',
            

{1: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2019-08-01'},
 3: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2020-04-01'},
 6: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2020-05-01'},
 21: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2022-06-01'},
 23: {'month_gaps': [], 'month_max': '2021-03-01', 'month_min': '2019-01-01'},
 25: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-10-01'},
 33: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2019-05-01'},
 34: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2019-06-01'},
 36: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2021-07-01'},
 40: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2019-10-01',
                     '2020-02-01',
                     '2020-05-01',
                     '2019-09-01',
                     '2019-12-01',
                     '2020-04-01',
            

{2: {'month_gaps': [], 'month_max': '2023-12-01', 'month_min': '2019-04-01'},
 4: {'month_gaps': ['2020-09-01',
                    '2019-07-01',
                    '2020-06-01',
                    '2020-12-01',
                    '2019-10-01',
                    '2020-02-01',
                    '2021-06-01',
                    '2021-02-01',
                    '2020-05-01',
                    '2020-11-01',
                    '2020-08-01',
                    '2019-12-01',
                    '2021-08-01',
                    '2021-04-01',
                    '2020-07-01',
                    '2019-11-01',
                    '2021-07-01',
                    '2020-10-01'],
     'month_max': '2021-09-01',
     'month_min': '2019-06-01'},
 5: {'month_gaps': [], 'month_max': '2021-09-01', 'month_min': '2019-04-01'},
 11: {'month_gaps': ['2020-04-01', '2019-07-01', '2020-01-01'],
      'month_max': '2020-07-01',
      'month_min': '2019-05-01'},
 21: {'month_gaps': [], 'month_max'

{27: {'month_gaps': [], 'month_max': '2023-02-01', 'month_min': '2020-02-01'},
 45: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2020-01-01'},
 50: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-10-01'},
 53: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2019-01-01'},
 59: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2020-08-01'},
 60: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2020-04-01'},
 65: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2019-07-01'},
 73: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2022-07-01'},
 75: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2022-01-01'},
 76: {'month_gaps': ['2020-03-01',
                     '2020-06-01',
                     '2020-02-01',
                     '2020-05-01',
                     '2019-12-01',
                     '2020-04-01',
                     '2020-01-01',
                     '2019-11-01'],
      'm

{3: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2020-10-01'},
 4: {'month_gaps': [], 'month_max': '2021-07-01', 'month_min': '2020-06-01'},
 12: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2022-01-01'},
 18: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2020-02-01'},
 33: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2023-07-01'},
 42: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2019-04-01'},
 48: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-03-01'},
 49: {'month_gaps': ['2020-08-01'],
      'month_max': '2020-09-01',
      'month_min': '2019-04-01'},
 69: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2020-01-01'},
 71: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2019-02-01'},
 72: {'month_gaps': [], 'month_max': '2021-07-01', 'month_min': '2020-07-01'},
 75: {'month_gaps': ['2023-01-01', '2022-10-01', '2022-11-01', '2022-07-01'],
      'month_max': '2023-03-01'

{2: {'month_gaps': ['2020-08-01', '2019-05-01', '2023-05-01'],
     'month_max': '2023-11-01',
     'month_min': '2019-01-01'},
 5: {'month_gaps': [], 'month_max': '2023-12-01', 'month_min': '2019-10-01'},
 9: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-08-01'},
 13: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2020-09-01'},
 17: {'month_gaps': [], 'month_max': '2020-08-01', 'month_min': '2019-02-01'},
 26: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2019-04-01'},
 28: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2023-03-01'},
 34: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2019-09-01'},
 35: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2019-01-01'},
 37: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-02-01'},
 39: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2023-07-01'},
 41: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-03-01'},
 44: 

{3: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2020-10-01'},
 4: {'month_gaps': [], 'month_max': '2021-07-01', 'month_min': '2020-06-01'},
 12: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2022-01-01'},
 18: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2020-02-01'},
 33: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2023-07-01'},
 42: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2019-04-01'},
 48: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-03-01'},
 49: {'month_gaps': ['2020-08-01'],
      'month_max': '2020-09-01',
      'month_min': '2019-04-01'},
 69: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2020-01-01'},
 71: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2019-02-01'},
 72: {'month_gaps': [], 'month_max': '2021-07-01', 'month_min': '2020-07-01'},
 75: {'month_gaps': ['2023-01-01', '2022-10-01', '2022-11-01', '2022-07-01'],
      'month_max': '2023-03-01'

{1: {'month_gaps': [], 'month_max': '2022-02-01', 'month_min': '2020-01-01'},
 3: {'month_gaps': [], 'month_max': '2022-04-01', 'month_min': '2019-03-01'},
 6: {'month_gaps': [], 'month_max': '2019-03-01', 'month_min': '2019-01-01'},
 8: {'month_gaps': ['2023-03-01'],
     'month_max': '2023-04-01',
     'month_min': '2021-02-01'},
 12: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-01-01'},
 23: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2020-01-01'},
 30: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2021-05-01'},
 31: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2020-04-01'},
 32: {'month_gaps': [], 'month_max': '2019-09-01', 'month_min': '2019-02-01'},
 33: {'month_gaps': [], 'month_max': '2019-11-01', 'month_min': '2019-05-01'},
 34: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2021-03-01'},
 37: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2022-03-01'},
 39: {'month_gaps': [], 'month_max

{2: {'month_gaps': ['2020-08-01', '2019-05-01', '2023-05-01'],
     'month_max': '2023-11-01',
     'month_min': '2019-01-01'},
 5: {'month_gaps': [], 'month_max': '2023-12-01', 'month_min': '2019-10-01'},
 9: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-08-01'},
 13: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2020-09-01'},
 17: {'month_gaps': [], 'month_max': '2020-08-01', 'month_min': '2019-02-01'},
 26: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2019-04-01'},
 28: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2023-03-01'},
 34: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2019-09-01'},
 35: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2019-01-01'},
 37: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-02-01'},
 39: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2023-07-01'},
 41: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-03-01'},
 44: 

{1: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2019-08-01'},
 3: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2020-04-01'},
 6: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2020-05-01'},
 21: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2022-06-01'},
 23: {'month_gaps': [], 'month_max': '2021-03-01', 'month_min': '2019-01-01'},
 25: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-10-01'},
 33: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2019-05-01'},
 34: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2019-06-01'},
 36: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2021-07-01'},
 40: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2019-10-01',
                     '2020-02-01',
                     '2020-05-01',
                     '2019-09-01',
                     '2019-12-01',
                     '2020-04-01',
            

{3: {'month_gaps': [], 'month_max': '2021-05-01', 'month_min': '2020-05-01'},
 16: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2020-03-01'},
 17: {'month_gaps': [], 'month_max': '2020-07-01', 'month_min': '2019-05-01'},
 26: {'month_gaps': [], 'month_max': '2020-09-01', 'month_min': '2019-07-01'},
 27: {'month_gaps': [], 'month_max': '2020-07-01', 'month_min': '2019-02-01'},
 40: {'month_gaps': ['2023-05-01'],
      'month_max': '2023-07-01',
      'month_min': '2019-10-01'},
 41: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2020-05-01'},
 42: {'month_gaps': [], 'month_max': '2022-03-01', 'month_min': '2020-01-01'},
 44: {'month_gaps': [], 'month_max': '2021-08-01', 'month_min': '2019-02-01'},
 45: {'month_gaps': [], 'month_max': '2022-04-01', 'month_min': '2019-03-01'},
 55: {'month_gaps': [], 'month_max': '2020-09-01', 'month_min': '2020-01-01'},
 61: {'month_gaps': [], 'month_max': '2020-12-01', 'month_min': '2020-11-01'},
 63: {'month_gaps': [], 'mont

{4: {'month_gaps': [], 'month_max': '2021-07-01', 'month_min': '2021-06-01'},
 6: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2022-07-01'},
 11: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2019-07-01'},
 12: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2021-04-01'},
 13: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-03-01'},
 17: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-08-01'},
 18: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2022-03-01'},
 38: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2019-10-01'},
 48: {'month_gaps': ['2020-06-01'],
      'month_max': '2022-06-01',
      'month_min': '2020-02-01'},
 54: {'month_gaps': [], 'month_max': '2019-08-01', 'month_min': '2019-05-01'},
 58: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2023-07-01'},
 68: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2019-02-01'},
 70: {'month_gaps': [], 'month

{2: {'month_gaps': ['2020-08-01', '2019-05-01', '2023-05-01'],
     'month_max': '2023-11-01',
     'month_min': '2019-01-01'},
 5: {'month_gaps': [], 'month_max': '2023-12-01', 'month_min': '2019-10-01'},
 9: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-08-01'},
 13: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2020-09-01'},
 17: {'month_gaps': [], 'month_max': '2020-08-01', 'month_min': '2019-02-01'},
 26: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2019-04-01'},
 28: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2023-03-01'},
 34: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2019-09-01'},
 35: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2019-01-01'},
 37: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-02-01'},
 39: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2023-07-01'},
 41: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-03-01'},
 44: 

{2: {'month_gaps': [], 'month_max': '2021-05-01', 'month_min': '2021-03-01'},
 7: {'month_gaps': [], 'month_max': '2020-06-01', 'month_min': '2019-01-01'},
 10: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2022-02-01'},
 14: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2019-01-01'},
 26: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2019-05-01'},
 30: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2020-02-01'},
 45: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2021-04-01'},
 49: {'month_gaps': [], 'month_max': '2020-09-01', 'month_min': '2019-07-01'},
 53: {'month_gaps': ['2023-02-01'],
      'month_max': '2023-07-01',
      'month_min': '2021-02-01'},
 62: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2020-03-01'},
 71: {'month_gaps': [], 'month_max': '2022-06-01', 'month_min': '2019-03-01'},
 73: {'month_gaps': ['2020-03-01',
                     '2020-02-01',
                     '2019-12-01',
    

{1: {'month_gaps': [], 'month_max': '2022-02-01', 'month_min': '2020-01-01'},
 3: {'month_gaps': [], 'month_max': '2022-04-01', 'month_min': '2019-03-01'},
 6: {'month_gaps': [], 'month_max': '2019-03-01', 'month_min': '2019-01-01'},
 8: {'month_gaps': ['2023-03-01'],
     'month_max': '2023-04-01',
     'month_min': '2021-02-01'},
 12: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2020-01-01'},
 23: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2020-01-01'},
 30: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2021-05-01'},
 31: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2020-04-01'},
 32: {'month_gaps': [], 'month_max': '2019-09-01', 'month_min': '2019-02-01'},
 33: {'month_gaps': [], 'month_max': '2019-11-01', 'month_min': '2019-05-01'},
 34: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2021-03-01'},
 37: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2022-03-01'},
 39: {'month_gaps': [], 'month_max

{9: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2021-09-01'},
 13: {'month_gaps': [], 'month_max': '2021-05-01', 'month_min': '2019-01-01'},
 19: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2020-02-01'},
 24: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2021-01-01'},
 25: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2021-02-01'},
 51: {'month_gaps': [], 'month_max': '2020-08-01', 'month_min': '2019-05-01'},
 119: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2022-01-01'},
 142: {'month_gaps': [], 'month_max': '2021-09-01', 'month_min': '2019-08-01'},
 150: {'month_gaps': [], 'month_max': '2022-06-01', 'month_min': '2020-05-01'},
 171: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2019-06-01'},
 176: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2021-08-01'},
 181: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2021-02-01'},
 183: {'month_gaps': [], 'month_max': '2019-09-

{1: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2019-06-01'},
 4: {'month_gaps': [], 'month_max': '2021-07-01', 'month_min': '2020-01-01'},
 9: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-11-01'},
 10: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2019-02-01'},
 15: {'month_gaps': [], 'month_max': '2023-12-01', 'month_min': '2019-04-01'},
 17: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2019-03-01'},
 18: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2020-09-01'},
 20: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2021-02-01'},
 24: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2023-03-01'},
 31: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2019-02-01'},
 35: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-08-01'},
 38: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-04-01'},
 44: {'month_gaps': [], 'month_max': '2023-12-01', 'mon

{4: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-02-01'},
 5: {'month_gaps': ['2019-09-01', '2020-03-01'],
     'month_max': '2023-08-01',
     'month_min': '2019-07-01'},
 6: {'month_gaps': ['2022-06-01',
                    '2022-03-01',
                    '2022-09-01',
                    '2022-12-01',
                    '2023-04-01',
                    '2021-10-01',
                    '2023-01-01',
                    '2023-07-01',
                    '2022-02-01',
                    '2022-05-01',
                    '2022-11-01',
                    '2022-08-01',
                    '2023-06-01',
                    '2021-12-01',
                    '2023-03-01',
                    '2022-04-01',
                    '2022-01-01',
                    '2022-07-01',
                    '2022-10-01',
                    '2021-11-01',
                    '2023-02-01',
                    '2023-05-01',
                    '2023-08-01'],
     'month_max': '2023-0

{3: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2019-04-01'},
 8: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2022-01-01'},
 12: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2019-01-01'},
 14: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2022-02-01'},
 27: {'month_gaps': [], 'month_max': '2019-09-01', 'month_min': '2019-05-01'},
 28: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2022-03-01'},
 29: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-02-01'},
 30: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-10-01'},
 33: {'month_gaps': [], 'month_max': '2021-05-01', 'month_min': '2020-01-01'},
 34: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-05-01'},
 35: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-04-01'},
 46: {'month_gaps': [], 'month_max': '2020-11-01', 'month_min': '2019-06-01'},
 47: {'month_gaps': [], 'month_max': '2022-09-01', 'mo

{3: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2019-04-01'},
 8: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2022-01-01'},
 12: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2019-01-01'},
 14: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2022-02-01'},
 27: {'month_gaps': [], 'month_max': '2019-09-01', 'month_min': '2019-05-01'},
 28: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2022-03-01'},
 29: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-02-01'},
 30: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-10-01'},
 33: {'month_gaps': [], 'month_max': '2021-05-01', 'month_min': '2020-01-01'},
 34: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-05-01'},
 35: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-04-01'},
 46: {'month_gaps': [], 'month_max': '2020-11-01', 'month_min': '2019-06-01'},
 47: {'month_gaps': [], 'month_max': '2022-09-01', 'mo

{5: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2021-08-01'},
 10: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2020-04-01'},
 25: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2022-07-01'},
 28: {'month_gaps': [], 'month_max': '2022-05-01', 'month_min': '2019-02-01'},
 41: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2020-07-01'},
 47: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2019-01-01'},
 53: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2019-03-01'},
 61: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2019-05-01'},
 66: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-02-01'},
 69: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-09-01'},
 70: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2019-05-01'},
 76: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2019-04-01'},
 80: {'month_gaps': [], 'month_max': '2021-03-01', 'm

{4: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-02-01'},
 5: {'month_gaps': ['2019-09-01', '2020-03-01'],
     'month_max': '2023-08-01',
     'month_min': '2019-07-01'},
 6: {'month_gaps': ['2022-06-01',
                    '2022-03-01',
                    '2022-09-01',
                    '2022-12-01',
                    '2023-04-01',
                    '2021-10-01',
                    '2023-01-01',
                    '2023-07-01',
                    '2022-02-01',
                    '2022-05-01',
                    '2022-11-01',
                    '2022-08-01',
                    '2023-06-01',
                    '2021-12-01',
                    '2023-03-01',
                    '2022-04-01',
                    '2022-01-01',
                    '2022-07-01',
                    '2022-10-01',
                    '2021-11-01',
                    '2023-02-01',
                    '2023-05-01',
                    '2023-08-01'],
     'month_max': '2023-0

{1: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2019-06-01'},
 4: {'month_gaps': [], 'month_max': '2021-07-01', 'month_min': '2020-01-01'},
 9: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-11-01'},
 10: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2019-02-01'},
 15: {'month_gaps': [], 'month_max': '2023-12-01', 'month_min': '2019-04-01'},
 17: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2019-03-01'},
 18: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2020-09-01'},
 20: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2021-02-01'},
 24: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2023-03-01'},
 31: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2019-02-01'},
 35: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-08-01'},
 38: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-04-01'},
 44: {'month_gaps': [], 'month_max': '2023-12-01', 'mon

{2: {'month_gaps': ['2020-12-01',
                    '2019-10-01',
                    '2022-03-01',
                    '2022-09-01',
                    '2022-12-01',
                    '2023-04-01',
                    '2021-10-01',
                    '2020-11-01',
                    '2020-08-01',
                    '2022-05-01',
                    '2019-12-01',
                    '2022-11-01',
                    '2021-03-01',
                    '2021-06-01',
                    '2023-03-01',
                    '2020-01-01',
                    '2022-01-01',
                    '2020-10-01',
                    '2022-10-01',
                    '2021-05-01',
                    '2023-02-01',
                    '2021-08-01',
                    '2023-05-01'],
     'month_max': '2023-08-01',
     'month_min': '2019-09-01'},
 19: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2021-06-01'},
 26: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-05-

{3: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2022-08-01'},
 6: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2020-07-01'},
 10: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2019-04-01'},
 12: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-03-01'},
 15: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2021-01-01'},
 20: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2022-03-01'},
 22: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2019-05-01'},
 23: {'month_gaps': [], 'month_max': '2020-06-01', 'month_min': '2019-02-01'},
 29: {'month_gaps': ['2020-03-01',
                     '2020-09-01',
                     '2019-07-01',
                     '2022-06-01',
                     '2022-12-01',
                     '2021-01-01',
                     '2021-07-01',
                     '2023-04-01',
                     '2023-01-01',
                     '2020-05-01',
                    

{4: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-02-01'},
 5: {'month_gaps': ['2019-09-01', '2020-03-01'],
     'month_max': '2023-08-01',
     'month_min': '2019-07-01'},
 6: {'month_gaps': ['2022-06-01',
                    '2022-03-01',
                    '2022-09-01',
                    '2022-12-01',
                    '2023-04-01',
                    '2021-10-01',
                    '2023-01-01',
                    '2023-07-01',
                    '2022-02-01',
                    '2022-05-01',
                    '2022-11-01',
                    '2022-08-01',
                    '2023-06-01',
                    '2021-12-01',
                    '2023-03-01',
                    '2022-04-01',
                    '2022-01-01',
                    '2022-07-01',
                    '2022-10-01',
                    '2021-11-01',
                    '2023-02-01',
                    '2023-05-01',
                    '2023-08-01'],
     'month_max': '2023-0

{3: {'month_gaps': [], 'month_max': '2020-04-01', 'month_min': '2019-04-01'},
 8: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2022-01-01'},
 12: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2019-01-01'},
 14: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2022-02-01'},
 27: {'month_gaps': [], 'month_max': '2019-09-01', 'month_min': '2019-05-01'},
 28: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2022-03-01'},
 29: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-02-01'},
 30: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-10-01'},
 33: {'month_gaps': [], 'month_max': '2021-05-01', 'month_min': '2020-01-01'},
 34: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2022-05-01'},
 35: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2023-04-01'},
 46: {'month_gaps': [], 'month_max': '2020-11-01', 'month_min': '2019-06-01'},
 47: {'month_gaps': [], 'month_max': '2022-09-01', 'mo

{3: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2020-11-01'},
 11: {'month_gaps': [], 'month_max': '2020-06-01', 'month_min': '2019-02-01'},
 18: {'month_gaps': [], 'month_max': '2023-03-01', 'month_min': '2019-03-01'},
 21: {'month_gaps': [], 'month_max': '2022-04-01', 'month_min': '2019-02-01'},
 34: {'month_gaps': [], 'month_max': '2019-07-01', 'month_min': '2019-01-01'},
 36: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2022-03-01'},
 39: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2020-07-01'},
 42: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2020-04-01'},
 43: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2021-02-01'},
 44: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2019-10-01'},
 47: {'month_gaps': [], 'month_max': '2019-12-01', 'month_min': '2019-01-01'},
 51: {'month_gaps': [], 'month_max': '2021-02-01', 'month_min': '2020-02-01'},
 54: {'month_gaps': ['2020-02-01', '2021-03-01'],
   

{1: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2019-06-01'},
 4: {'month_gaps': [], 'month_max': '2021-07-01', 'month_min': '2020-01-01'},
 9: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-11-01'},
 10: {'month_gaps': [], 'month_max': '2023-09-01', 'month_min': '2019-02-01'},
 15: {'month_gaps': [], 'month_max': '2023-12-01', 'month_min': '2019-04-01'},
 17: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2019-03-01'},
 18: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2020-09-01'},
 20: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2021-02-01'},
 24: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2023-03-01'},
 31: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2019-02-01'},
 35: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-08-01'},
 38: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2020-04-01'},
 44: {'month_gaps': [], 'month_max': '2023-12-01', 'mon

{4: {'month_gaps': [], 'month_max': '2022-12-01', 'month_min': '2019-09-01'},
 12: {'month_gaps': [], 'month_max': '2023-05-01', 'month_min': '2022-01-01'},
 15: {'month_gaps': ['2023-04-01'],
      'month_max': '2023-10-01',
      'month_min': '2019-07-01'},
 23: {'month_gaps': [], 'month_max': '2020-12-01', 'month_min': '2019-10-01'},
 42: {'month_gaps': [], 'month_max': '2023-07-01', 'month_min': '2021-05-01'},
 43: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2020-09-01'},
 44: {'month_gaps': [], 'month_max': '2021-04-01', 'month_min': '2020-02-01'},
 50: {'month_gaps': [], 'month_max': '2023-01-01', 'month_min': '2021-01-01'},
 54: {'month_gaps': [], 'month_max': '2021-04-01', 'month_min': '2021-03-01'},
 60: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2022-03-01'},
 61: {'month_gaps': [], 'month_max': '2023-01-01', 'month_min': '2020-01-01'},
 64: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2023-01-01'},
 82: {'month_gaps': [], 'mont

{3: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2022-08-01'},
 6: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2020-07-01'},
 10: {'month_gaps': [], 'month_max': '2021-06-01', 'month_min': '2019-04-01'},
 12: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-03-01'},
 15: {'month_gaps': [], 'month_max': '2023-04-01', 'month_min': '2021-01-01'},
 20: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2022-03-01'},
 22: {'month_gaps': [], 'month_max': '2023-10-01', 'month_min': '2019-05-01'},
 23: {'month_gaps': [], 'month_max': '2020-06-01', 'month_min': '2019-02-01'},
 29: {'month_gaps': ['2020-03-01',
                     '2020-09-01',
                     '2019-07-01',
                     '2022-06-01',
                     '2022-12-01',
                     '2021-01-01',
                     '2021-07-01',
                     '2023-04-01',
                     '2023-01-01',
                     '2020-05-01',
                    

{4: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2021-01-01'},
 6: {'month_gaps': [], 'month_max': '2020-11-01', 'month_min': '2019-07-01'},
 16: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-08-01'},
 17: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2021-01-01'},
 22: {'month_gaps': [], 'month_max': '2021-04-01', 'month_min': '2020-02-01'},
 37: {'month_gaps': ['2023-01-01', '2021-03-01'],
      'month_max': '2023-04-01',
      'month_min': '2019-01-01'},
 38: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2019-05-01'},
 50: {'month_gaps': [], 'month_max': '2019-09-01', 'month_min': '2019-08-01'},
 54: {'month_gaps': ['2020-03-01',
                     '2019-07-01',
                     '2021-03-01',
                     '2022-03-01',
                     '2022-02-01',
                     '2021-05-01',
                     '2020-04-01',
                     '2021-08-01',
                     '2022-04-01',
                 

{3: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2020-11-01'},
 11: {'month_gaps': [], 'month_max': '2020-06-01', 'month_min': '2019-02-01'},
 18: {'month_gaps': [], 'month_max': '2023-03-01', 'month_min': '2019-03-01'},
 21: {'month_gaps': [], 'month_max': '2022-04-01', 'month_min': '2019-02-01'},
 34: {'month_gaps': [], 'month_max': '2019-07-01', 'month_min': '2019-01-01'},
 36: {'month_gaps': [], 'month_max': '2023-06-01', 'month_min': '2022-03-01'},
 39: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2020-07-01'},
 42: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2020-04-01'},
 43: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2021-02-01'},
 44: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2019-10-01'},
 47: {'month_gaps': [], 'month_max': '2019-12-01', 'month_min': '2019-01-01'},
 51: {'month_gaps': [], 'month_max': '2021-02-01', 'month_min': '2020-02-01'},
 54: {'month_gaps': ['2020-02-01', '2021-03-01'],
   

{5: {'month_gaps': [], 'month_max': '2023-08-01', 'month_min': '2021-08-01'},
 10: {'month_gaps': [], 'month_max': '2022-07-01', 'month_min': '2020-04-01'},
 25: {'month_gaps': [], 'month_max': '2023-11-01', 'month_min': '2022-07-01'},
 28: {'month_gaps': [], 'month_max': '2022-05-01', 'month_min': '2019-02-01'},
 41: {'month_gaps': [], 'month_max': '2020-10-01', 'month_min': '2020-07-01'},
 47: {'month_gaps': [], 'month_max': '2022-10-01', 'month_min': '2019-01-01'},
 53: {'month_gaps': [], 'month_max': '2021-12-01', 'month_min': '2019-03-01'},
 61: {'month_gaps': [], 'month_max': '2022-09-01', 'month_min': '2019-05-01'},
 66: {'month_gaps': [], 'month_max': '2022-11-01', 'month_min': '2020-02-01'},
 69: {'month_gaps': [], 'month_max': '2021-11-01', 'month_min': '2020-09-01'},
 70: {'month_gaps': [], 'month_max': '2022-08-01', 'month_min': '2019-05-01'},
 76: {'month_gaps': [], 'month_max': '2021-10-01', 'month_min': '2019-04-01'},
 80: {'month_gaps': [], 'month_max': '2021-03-01', 'm

### 2, parallel upload

Your task is to upload data from an object store (can be S3, HDFS, ...) to Elastic.
Elastic server has a given number of data nodes e.x. 5 and each one of the nodes will hold some of the indices you want to upload your data into. You can get the info by an API call such as `http://your-elastic-server:port/_cat/shards` this will get you an output of all the indices and details about them in a structure: name_of_the_index, (some unimportant stats), data_node

Example:
- index-2018-01 … data-node-01
- index-2018-02 … data-node-03
- index-2018-03 … data-node-02
- index-2018-04 … data-node-04
- index-2018-05 … data-node-04
- index-2018-06 … data-node-05
- index-2018-07 … data-node-01

As you can see the distribution is quite random, but it's going to be pretty even across the data-nodes.

One specific index will always hold data for a year and month combination e.x. index-2018-01 will have all the data for 2018-01. Luckily your teammates already prepared the data for you with this structure in mind, so your data in object store are partitioned by year and month as you need, e.x. `created_year=2017/created_month=1`, `created_year=2018/created_month=12`, etc. and you also have a function `write_to_elastic_index(df, year, month, target_index)` you can leverage.
All you have to do is point our df (dataframe) to the right location, specify the partition filters (year and month) and target_index and it will do the dirty work for you (well, in reality it's going to leverage the elasticsearch-spark library).

```python
def write_to_elastic_index(df, year, month, target_index) -> None:
    """Writes data filtered from dataframe (df) by the created_year=/created_month partition filter into given Elastic index.

    Example:
    # write data from object store partition 'created_year=2018/created_month=1' into Elasticsearch index named 'index-2018-01'
    write_to_elastic(df, 2018, 1, 'index-2018-01')
"""

```

To effectively utilize the resources, it makes sense to parallelize the task as much as possible, but you cannot process more than one write request per data-node, otherwise it will crush. Write an application that will handle uploading all the data in the most effective way.
You can assume a fixed number of data-nodes (or find out the number based on the API response) and data at least from `2017-01 (created_year=2017/created_month=1)` to `2020-01 (created_year=2020/created_month=1)` without any gaps.



### Solution

Since each data node allows only a single write operation, our task parallelization is constrained by the number of nodes on the server. Each thread will only load indices for a specific node.

In [6]:
from time import sleep
from concurrent.futures import ThreadPoolExecutor
from collections import defaultdict

import sys
import logging

logging.basicConfig(level=logging.INFO, format="%(threadName)s:%(message)s")
logger = logging.getLogger(__name__)


# Fake function to get {name_of_the_index: data_node} mapping
def get_shards(date_range, number_of_nodes=5):
    return {"index-{}-{:02d}".format(*date): "data-node-{:02d}".format(randint(1, number_of_nodes)) for date in date_range}

# Fake function to write data to elastic
def write_to_elastic_index(df, year, month, target_index) -> None:
    """Writes data filtered from dataframe (df) by the created_year=/created_month partition filter into given Elastic index.

    Example:
    # write data from object store partition 'created_year=2018/created_month=1' into Elasticsearch index named 'index-2018-01'
    write_to_elastic(df, 2018, 1, 'index-2018-01')
    """
    sleep(0.1)
    
    
# Upload all indices to specific node
def upload_indices_to_node(node, target_indices):
    for target_index in target_indices:
        year, month = map(int, target_index.split('-')[1:])
        logger.info(f"({node}, {target_index}) -> Writing...")
        write_to_elastic_index([], year, month, target_index)
        logger.info(f"({node}, {target_index}) -> Completed")
        


# We can reuse date range calculation from the previous task
date_range = calc_month_range((2017, 1), (2020, 1))

# Number of nodes
number_of_nodes = 5

# Get index: node mapping
shards = get_shards(date_range, number_of_nodes)

# Aggregate indices by nodes
node_to_index = defaultdict(list)
for target_index, node in shards.items():
    node_to_index[node].append(target_index)

### Parallel threads

In [7]:
%%time

with ThreadPoolExecutor(max_workers=number_of_nodes) as executor:
    futures = [executor.submit(upload_indices_to_node, node, target_indices) for node, target_indices in node_to_index.items()]
    
    for future in futures:
        future.result()

ThreadPoolExecutor-0_0:(data-node-04, index-2019-04) -> Writing...
ThreadPoolExecutor-0_1:(data-node-03, index-2019-07) -> Writing...
ThreadPoolExecutor-0_2:(data-node-02, index-2019-10) -> Writing...
ThreadPoolExecutor-0_3:(data-node-01, index-2017-01) -> Writing...
ThreadPoolExecutor-0_4:(data-node-05, index-2018-11) -> Writing...
ThreadPoolExecutor-0_0:(data-node-04, index-2019-04) -> Completed
ThreadPoolExecutor-0_0:(data-node-04, index-2017-07) -> Writing...
ThreadPoolExecutor-0_1:(data-node-03, index-2019-07) -> Completed
ThreadPoolExecutor-0_1:(data-node-03, index-2018-02) -> Writing...
ThreadPoolExecutor-0_2:(data-node-02, index-2019-10) -> Completed
ThreadPoolExecutor-0_2:(data-node-02, index-2017-04) -> Writing...
ThreadPoolExecutor-0_3:(data-node-01, index-2017-01) -> Completed
ThreadPoolExecutor-0_3:(data-node-01, index-2018-08) -> Writing...
ThreadPoolExecutor-0_4:(data-node-05, index-2018-11) -> Completed
ThreadPoolExecutor-0_4:(data-node-05, index-2019-03) -> Writing...


CPU times: total: 0 ns
Wall time: 1.02 s


### Single thread

In [8]:
%%time

for node, target_indices in node_to_index.items():
    upload_indices_to_node(node, target_indices)

MainThread:(data-node-04, index-2019-04) -> Writing...
MainThread:(data-node-04, index-2019-04) -> Completed
MainThread:(data-node-04, index-2017-07) -> Writing...
MainThread:(data-node-04, index-2017-07) -> Completed
MainThread:(data-node-04, index-2018-05) -> Writing...
MainThread:(data-node-04, index-2018-05) -> Completed
MainThread:(data-node-04, index-2019-12) -> Writing...
MainThread:(data-node-04, index-2019-12) -> Completed
MainThread:(data-node-04, index-2017-09) -> Writing...
MainThread:(data-node-04, index-2017-09) -> Completed
MainThread:(data-node-04, index-2017-06) -> Writing...
MainThread:(data-node-04, index-2017-06) -> Completed
MainThread:(data-node-04, index-2018-01) -> Writing...
MainThread:(data-node-04, index-2018-01) -> Completed
MainThread:(data-node-04, index-2019-05) -> Writing...
MainThread:(data-node-04, index-2019-05) -> Completed
MainThread:(data-node-04, index-2017-05) -> Writing...
MainThread:(data-node-04, index-2017-05) -> Completed
MainThread:(data-no

CPU times: total: 0 ns
Wall time: 3.75 s


## Spark (solve with Spark 2.4+)

### 3, caching

Assume we have a parquet file with these three columns: col1, col2, col3 (all have numerical values). Next we create a Spark DataFrame as follows:
```python
df = spark.read.parquet(path_to_the_data)
```

In the next step we filter the data and use caching on the filtered DataFrame:
```python
df.select('col1', 'col2').filter(col('col2') > 100).cache()
df.count()
```

Now we run these three queries:
```python
1) df.select('col1', 'col2').filter(col('col2') > 101).collect()
2) df.select('col1', 'col2').withColumn('col4', lit('test')).filter(col('col2') > 100).collect()
2) df.select('col1').filter(col('col2') > 100).collect()
```
**Which of these three queries will take the data from cache? Please explain your answer.**

### Answer
We can see if a DataFrame was cached in our physical plan using `explain` operator (where `InMemoryRelation` entities reflect cached datasets with their storage level). 

Of the three options, only the third returns InMemoryRelation entity. So the correct answer is:
```python
3) df.select('col1').filter(col('col2') > 100).collect()
```

In other cases, PySpark will scan original parquet.

But we still can manage the caching ourselves. We can enforce the use of caching with a command like the following:
```python
df2 = df.select('col1', 'col2').filter(col('col2') > 100).cache()
df2.count()
```
However, it is important to carefully analyze our actions; failing to do so could lead to incorrect results.

In the first case, we can use the cache because we are only accessing data within the cached dataframe where the values in 'col2' are greater than 100
```python
df2.select('col1', 'col2').filter(col('col2') > 101).collect()
```
In the second case, this will also not affect the result. So we can safely use the command:
```python
df2.select('col1', 'col2').withColumn('col4', lit('test')).filter(col('col2') > 100).collect()
```

In [9]:
from pyspark.sql.functions import rand, col, lit
from pyspark.sql import SparkSession
import os


def generate_dummy_data(file_name, size=1E6):
    spark = SparkSession.builder.appName("Dummy Data Generator").getOrCreate()
    df = spark.range(size)\
        .withColumn("col1", rand() * 1000)\
        .withColumn("col2", rand() * 1000)\
        .withColumn("col3", rand() * 1000)
    df.coalesce(10).write.parquet('dummy_data', mode='overwrite')
    spark.stop()

### Read parquet

In [10]:
spark = SparkSession.builder.appName("Read parquet").getOrCreate()

dummy_data_path = 'dummy_data'

if not os.path.exists(dummy_data_path):
    generate_dummy_data(dummy_data_path)

spark = SparkSession.builder.appName("Read parquet").getOrCreate()
df = spark.read.parquet(dummy_data_path)

### Caching

In [11]:
df.select('col1', 'col2').filter(col('col2') > 100).cache()
df.count()

1000000

### 1) `df.select('col1', 'col2').filter(col('col2') > 101).collect()`

In [12]:
df.select('col1', 'col2').filter(col('col2') > 101).explain()

== Physical Plan ==
*(1) Filter (isnotnull(col2#2) AND (col2#2 > 101.0))
+- *(1) ColumnarToRow
   +- FileScan parquet [col1#1,col2#2] Batched: true, DataFilters: [isnotnull(col2#2), (col2#2 > 101.0)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/C:/Users/nekowaii/Documents/GitHub/Accolade-assignment/dummy_data], PartitionFilters: [], PushedFilters: [IsNotNull(col2), GreaterThan(col2,101.0)], ReadSchema: struct<col1:double,col2:double>




### 2) `df.select('col1', 'col2').withColumn('col4', lit('test')).filter(col('col2') > 100).collect()`

In [13]:
df.select('col1', 'col2').withColumn('col4', lit('test')).filter(col('col2') > 100).explain()

== Physical Plan ==
*(1) Project [col1#1, col2#2, test AS col4#34]
+- *(1) Filter (isnotnull(col2#2) AND (col2#2 > 100.0))
   +- *(1) ColumnarToRow
      +- FileScan parquet [col1#1,col2#2] Batched: true, DataFilters: [isnotnull(col2#2), (col2#2 > 100.0)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/C:/Users/nekowaii/Documents/GitHub/Accolade-assignment/dummy_data], PartitionFilters: [], PushedFilters: [IsNotNull(col2), GreaterThan(col2,100.0)], ReadSchema: struct<col1:double,col2:double>




### * 3) `df.select('col1').filter(col('col2') > 100).collect()`

In [14]:
df.select('col1').filter(col('col2') > 100).explain()

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- InMemoryTableScan [col1#1]
      +- InMemoryRelation [col1#1, col2#2], StorageLevel(disk, memory, deserialized, 1 replicas)
            +- *(1) Filter (isnotnull(col2#2) AND (col2#2 > 100.0))
               +- *(1) ColumnarToRow
                  +- FileScan parquet [col1#1,col2#2] Batched: true, DataFilters: [isnotnull(col2#2), (col2#2 > 100.0)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/C:/Users/nekowaii/Documents/GitHub/Accolade-assignment/dummy_data], PartitionFilters: [], PushedFilters: [IsNotNull(col2), GreaterThan(col2,100.0)], ReadSchema: struct<col1:double,col2:double>




In [15]:
spark.stop()