# Задача:
Взять топ 150 стран по объёму импорта и зеркального импорта товара в 2019 году, Россию туда добавить в любом случае. 

Для каждого месяца с 2015 по 2023 год должна быть информация - страна импортёр, страна из которой был ввезён импорт, объёмы физические и стоимостные для каждой комбинации

# Импорт модулей

In [2]:
import time
import os
import pandas as pd
import requests

# Коды UN-49M

Это коды соответствия названия страны/обьединения и их кода

In [3]:
URL = 'https://comtradeapi.un.org/files/v1/app/reference/Reporters.json'

response = requests.get(URL)
data = response.json()

m49codeToName = dict()

for country in data['results']:
    reporterCode = country['id']
    name = country['text']
    isGroup = country['isGroup']

    if not isGroup:
        m49codeToName[reporterCode] = name

для примера, код Российской федерации - 643

In [4]:
m49codeToName[643]

'Russian Federation'

# Топ 150 стран по обьему импорта бензина в 2019 году

Получим список стран для которых будем собирать статистику. 2019 год взят за основу

Определимся с **hs_code** - кодом товара, по которому будем собирать статистику и зеркальную статистику

In [5]:
hs_code = 270900 # Нефть

In [8]:
PRIMARY_KEY = '9880e05710ad45f6936acc3b7286211f' # эти ключи регулярно обновляются и удаляются, поэтому не страшно их выкладывать

In [14]:
typeCode = 'C' # Товары
freqCode = 'A' # Данные по годам
clCode = 'HS'  

URL = f'https://comtradeapi.un.org/data/v1/get/{typeCode}/{freqCode}/{clCode}' 

params = {
    'subscription-key': PRIMARY_KEY, # Ключ доступа к API Comtrade
    'partnerCode':      0,           # Код UN M49 для всего мира
    'period':           2019,
    'flowCode':         'M',         # Импорт
    'cmdCode':          hs_code      #
}

In [9]:
response = requests.get(URL, params)

response_data = response.json()['data']
df = pd.json_normalize(response_data)

df['Country'] = df.reporterCode.apply(lambda key: m49codeToName.get(key, ''))

In [10]:
top150 = df.sort_values(by='primaryValue', ascending=False)\
    .drop_duplicates(subset=['reporterCode'])\
    [['Country', 'reporterCode', 'primaryValue']]\
    .reset_index(drop=True)\
    .head(150)

In [11]:
top150.head(10)

Unnamed: 0,Country,reporterCode,primaryValue
0,China,156,242384900000.0
1,USA,842,132370700000.0
2,India,699,101948600000.0
3,Japan,392,73086770000.0
4,Rep. of Korea,410,70251770000.0
5,Germany,276,40745650000.0
6,Netherlands,528,34062710000.0
7,Spain,724,29994410000.0
8,Italy,380,29102360000.0
9,United Arab Emirates,784,27750560000.0


### Проверяем что там есть РФ

In [12]:
top150[top150.reporterCode == 643]

Unnamed: 0,Country,reporterCode,primaryValue
89,Russian Federation,643,80552.0


# Создаем список стран

In [15]:
CountryID = list(top150.reporterCode)

In [202]:
type(CountryID[0])

int

# Соберем статистику импорта

Для каждого месяца с 2015 по 2023 год должна быть информация - страна Экспортер, страна в которую был ввезён экспорт, объёмы физические и стоимостные для каждой комбинации

По скольку Comtrade API блокирует запросы которые поступают чаще, чем раз в 10 секунд, то все данные собираются синхронно

*Для асинхронной работы можно получить специальный доступ у них на сайте*

In [16]:
min_year = 2015
max_year = 2023

years = range(min_year, max_year + 1)
months = [f"{month:02}" for month in range(1, 12 + 1)]

### сформируем запрос

он будет немного отличаться, чтобы не забывать его структуру, дублирую

In [20]:
typeCode = 'C' # Товары
freqCode = 'M' # Данные по месяцам
clCode = 'HS'  # Тип товара 

URL = f'https://comtradeapi.un.org/data/v1/get/{typeCode}/{freqCode}/{clCode}' 

params = {
    'subscription-key': PRIMARY_KEY, # Subscription key 
    'flowCode':         'X',         # Export
    'cmdCode':          hs_code,     #
    'reporterCode':     ','.join(map(str, CountryID))
}

In [19]:
def pull_data(params):
    """получение pd.DataFrame по запросу Comtrade API"""
    
    response = requests.get(URL, params)

    try:
        response_data = response.json()['data']
        df = pd.json_normalize(response_data)
    except Exception as exc:
        raise Exception(f'cant normalize data: {exc}, response: {response.json()}')
    return df        

# Сформируем модель получаемых данных

Данные будут сохраняться в датафрейм со следующими столбцами:
- period: временной промежуток времени данной статистики
- reporterCode: код страны, из которой проводился экспорт
- primaryValue: стоимость поставки (долл. США)
- netWgt: физический обьем

In [21]:
data = pd.DataFrame({'period': [], 'reporterCode': [], 'primaryValue': [], 'netWgt': []})

# Парсинг данных

In [None]:
for i, year in enumerate(years):
    print(f'{(i + 1) /  9 * 100:0.2f}%\t{year = }')

    period = ','.join(str(year) + str(month) for month in months)
    params['period'] = period
    response = requests.get(URL, params)    
    
    try:
        response_data = response.json()['data']
        df = pd.json_normalize(response_data)

    except Exception as exc:
        print(f'\t\tPULLING STOPPED\n\nPulling stopped with exception: {exc}\n\nData saved to ./ABORTED DATA: {hash(period)}.pkl')
        data.to_pickle(f'ABORTED DATA: {hash(period)}.pkl')
        break
    
    periods = df['period'].unique()

    # Проверка на healthcheck - 842 это код США, если там адекватные значения, то парсинг удачный
    print(df[df['reporterCode'] == 842]['primaryValue'].mean())

    
    for period_ in periods:
        for reporterCode in CountryID:
            filtered_df = df[(df['reporterCode'] == reporterCode) & (df['period'] == period_)]
            
            partners = list(filtered_df['partnerCode'])
            wgts = list(filtered_df['netWgt'])
            values = list(filtered_df['primaryValue'])
    
            primaryValues = {country: value for country, value in zip(partners, values)}
            netWgts = {country: weight for country, weight in zip(partners, wgts)}
    
            row = pd.DataFrame({'period': [period_], 'reporterCode': [reporterCode], 'primaryValue': [primaryValues], 'netWgt': [netWgts]})
            data = pd.concat([data, row])

                                                          

11.11%	year = 2015
356679053.5102041
22.22%	year = 2016


In [238]:
data[data['primaryValue'] == {}].cmdCode.unique()

array([784., 566., 156., 586., 608., 434., 422., 104., 591., 804., 404.,
       516., 188., 144., 116.,  68., 340., 196., 800., 120., 788., 780.,
       716., 466.,   4.,  44., 496., 268., 524., 388., 686.,  96., 894.,
       508., 454.,  31., 275., 426., 418., 328.,  51., 450., 498., 762.,
       598., 352., 204., 108., 882., 478.,  84., 690., 446.,   8., 490.,
       662., 140.,  52.,  20.,  60., 860., 132., 400., 634., 768., 231.,
       170., 887., 376., 258., 308., 174., 834., 600., 558., 178., 180.,
       218., 470., 398., 417., 360., 504., 288.,  nan])

In [257]:
data.dropna(subset=['reporterCode'], inplace=True)
data.drop(columns=['cmdCode'], inplace=True)

In [261]:
data.to_csv('graph data exp.csv')

# Зеркальная статистика

In [484]:
typeCode = 'C' # Товары
freqCode = 'M' # Данные по месяцам
clCode = 'HS' 

URL = f'https://comtradeapi.un.org/data/v1/get/{typeCode}/{freqCode}/{clCode}' 

params = {
    'subscription-key': PRIMARY_KEY, # Subscription key 
    'flowCode':         'X',         # Export
    'cmdCode':          hs_code,     #
    'partnerCode':      None,
    'reporterCode':     ','.join(map(str, CountryID))
}

In [384]:
def pull_data(params):    
    response = requests.get(URL, params)

    response_data = response.json()['data']
    
    df = pd.json_normalize(response_data)
    
    return df        

In [393]:
mirror_data = pd.DataFrame({'period': [], 'reporterCode': [], 'mirrorPrimaryValue': [], 'mirrorNetWgt': []})

In [394]:
min_year = 2015
max_year = 2023

def years_g():
    for year in range(min_year, max_year + 1):
        yield year

years = years_g()

months = [f"{month:02}" for month in range(1, 12 + 1)]

In [395]:
for year in [2021, 2022, 2023]:
    print(f'{year = }')
    period = ','.join(str(year) + str(month) for month in months)

    for i, partnerCode in enumerate(CountryID):
        time.sleep(10)

        params['period'] = period
        params['partnerCode'] = partnerCode

        response = requests.get(URL, params, timeout=120)

        try:
            response_data = response.json()['data']
            df = pd.json_normalize(response_data)        
        except Exception as exc:        
            print(exc)
            time.sleep(10)
            df = pull_data(params)

        print(f'\t{(i / 150 * 100):0.2f}%\{partnerCode = }')
            
        periods = period.split(',')
        
        for period_ in periods:
            try:
                filtered_df = df[df['period'] == period_]
                
                reporters = list(filtered_df['reporterCode'])
                m_values = list(filtered_df['primaryValue'])
                m_wgts = list(filtered_df['netWgt'])
            
                m_primaryValues = {country: value for country, value in zip(reporters, m_values)}
                m_netWgts = {country: weight for country, weight in zip(reporters, m_wgts)}
            
                row = pd.DataFrame({'period': [period_], 'reporterCode': [partnerCode],
                                            'mirrorPrimaryValue': [m_primaryValues], 'mirrorNetWgt': [m_netWgts]})
            
                mirror_data = pd.concat([mirror_data, row])
            except:
                row = pd.DataFrame({'period': [period_], 'reporterCode': [partnerCode],
                                                'mirrorPrimaryValue': [{}], 'mirrorNetWgt': [{}]})
                
                mirror_data = pd.concat([mirror_data, row])
    
                                                              

year = 2021
	0.00%\partnerCode = 842
	0.67%\partnerCode = 784
	1.33%\partnerCode = 484
	2.00%\partnerCode = 702
	2.67%\partnerCode = 410
	3.33%\partnerCode = 528
	4.00%\partnerCode = 392
	4.67%\partnerCode = 360
	5.33%\partnerCode = 124
	6.00%\partnerCode = 458
	6.67%\partnerCode = 276
	7.33%\partnerCode = 818
	8.00%\partnerCode = 76
	8.67%\partnerCode = 566
	9.33%\partnerCode = 710
	10.00%\partnerCode = 156
	10.67%\partnerCode = 56
	11.33%\partnerCode = 586
	12.00%\partnerCode = 608
	12.67%\partnerCode = 764
	13.33%\partnerCode = 36
	14.00%\partnerCode = 682
	14.67%\partnerCode = 826
	15.33%\partnerCode = 251
	16.00%\partnerCode = 170
	16.67%\partnerCode = 214
	17.33%\partnerCode = 24
	18.00%\partnerCode = 699
	18.67%\partnerCode = 434
	19.33%\partnerCode = 704
	20.00%\partnerCode = 422
	20.67%\partnerCode = 104
	21.33%\partnerCode = 380
	22.00%\partnerCode = 591
	22.67%\partnerCode = 757
	23.33%\partnerCode = 320
	24.00%\partnerCode = 752
	24.67%\partnerCode = 724
	25.33%\partnerCode

ConnectionError: HTTPSConnectionPool(host='comtradeapi.un.org', port=443): Max retries exceeded with url: /data/v1/get/C/M/HS?subscription-key=bb2254ada9f74608a3996ce3b2b2227b&flowCode=X&cmdCode=271012&partnerCode=598&reporterCode=842%2C784%2C484%2C702%2C410%2C528%2C392%2C360%2C124%2C458%2C276%2C818%2C76%2C566%2C710%2C156%2C56%2C586%2C608%2C764%2C36%2C682%2C826%2C251%2C170%2C214%2C24%2C699%2C434%2C704%2C422%2C104%2C380%2C591%2C757%2C320%2C752%2C724%2C804%2C404%2C516%2C188%2C792%2C72%2C554%2C144%2C246%2C116%2C887%2C834%2C68%2C348%2C340%2C604%2C196%2C376%2C800%2C242%2C579%2C120%2C222%2C788%2C384%2C344%2C504%2C600%2C40%2C780%2C203%2C705%2C716%2C372%2C466%2C4%2C32%2C616%2C233%2C300%2C44%2C496%2C268%2C208%2C524%2C854%2C388%2C703%2C442%2C152%2C686%2C417%2C96%2C748%2C894%2C508%2C454%2C620%2C100%2C31%2C275%2C646%2C558%2C426%2C180%2C418%2C328%2C440%2C51%2C470%2C428%2C288%2C450%2C498%2C191%2C762%2C70%2C598%2C352%2C807%2C204%2C108%2C882%2C642%2C512%2C688%2C478%2C643%2C84%2C690%2C446%2C8%2C490%2C662%2C258%2C140%2C112%2C499%2C52%2C308%2C218%2C20%2C60%2C178%2C398%2C860%2C132%2C400%2C634%2C768%2C174%2C231&period=202301%2C202302%2C202303%2C202304%2C202305%2C202306%2C202307%2C202308%2C202309%2C202310%2C202311%2C202312 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7fc2c4dea7d0>: Failed to resolve 'comtradeapi.un.org' ([Errno -2] Name or service not known)"))

In [387]:
response.json()

{'statusCode': 403,
 'message': 'Out of call volume quota. Quota will be replenished in 21:01:26.'}

In [361]:
response = requests.get(URL, params)

In [396]:
mirror_data

Unnamed: 0,period,reporterCode,mirrorPrimaryValue,mirrorNetWgt
0,202101,842.0,"{222: 1089459.95, 233: 8863666.12, 246: 651618...","{222: 1252250.0, 233: 18131920.0, 246: 1395899..."
0,202102,842.0,"{222: 1389884.9, 233: 10508368.88, 246: 193194...","{222: 1597570.0, 233: 14053979.0, 246: 3003950..."
0,202103,842.0,"{724: 215091667.723, 76: 24859856.0, 620: 8062...","{724: 356608756.0, 76: 55562300.0, 620: 131872..."
0,202104,842.0,"{372: 68424.619, 484: 3222943.0, 643: 49724003...","{372: 17850.0, 484: 5166822.351, 643: 83345570..."
0,202105,842.0,"{484: 29124.0, 233: 11935228.29, 688: 375.0, 6...","{484: 46689.791, 233: 14912453.0, 688: 192.0, ..."
...,...,...,...,...
0,202308,70.0,"{276: 49762.99, 380: 29160.499, 528: 85.089, 6...","{276: 10847.0, 380: 23280.0, 528: 22.0, 616: 7..."
0,202309,70.0,"{56: 95.471, 191: 7668845.0, 203: 15.0, 276: 7...","{56: 8.5, 191: 7342892.912, 203: 1.0, 276: 161..."
0,202310,70.0,"{191: 6370117.0, 276: 50903.019, 380: 30931.35...","{191: 7413514.96, 276: 9844.0, 380: 22980.0, 6..."
0,202311,70.0,"{56: 29294.619, 191: 4594703.0, 276: 28959.564...","{56: 24200.0, 191: 5581309.89, 276: 7999.0, 38..."


In [375]:
response.json()

{'statusCode': 403,
 'message': 'Out of call volume quota. Quota will be replenished in 20:21:02.'}

In [370]:
params['partnerCode'] = 842
response = requests.get(URL, params)

In [367]:
response.json()

{'elapsedTime': '3.34 secs', 'count': 0, 'data': [], 'error': ''}

In [397]:
mirror_data.to_pickle('mirrored export 2021-2023.pkl')

In [389]:
mirror_data.to_pickle('mirrored export 2020.pkl')

In [381]:
mirror_data.to_pickle('mirrored export 2016-2020.pkl')

In [374]:
mirror_data.to_pickle('mirrored export 2016-2017.pkl')

In [329]:
mirror_data.to_pickle('mirrored export 2015.pkl')

In [398]:
m1 = pd.read_pickle('mirrored export 2015.pkl')
m2 = pd.read_pickle('mirrored export 2016-2020.pkl')
m3 = pd.read_pickle('mirrored export 2020.pkl')
m4 = pd.read_pickle('mirrored export 2021-2023.pkl')

In [412]:
mirror_data = pd.concat([m1, m2, m3, m4])
mirror_data = mirror_data.drop_duplicates(subset=['period', 'reporterCode'])

In [414]:
result = data.merge(mirror_data, on=['period', 'reporterCode'], how='outer')

# Заполняем пропуски

In [416]:
result[result['mirrorPrimaryValue'].isna() == True]

Unnamed: 0,period,primaryValue,netWgt,reporterCode,mirrorPrimaryValue,mirrorNetWgt
90,201501,{},{},96.0,,
125,201501,"{899: 86726.0, 757: 652745974.0, 51: 3239270.0...","{899: 153700.0, 757: 1221529642.0, 51: 5466335...",643.0,,
126,201501,{},{},84.0,,
127,201501,{},{},690.0,,
128,201501,{},{},446.0,,
...,...,...,...,...,...,...
16195,202312,{},{},400.0,,
16196,202312,{},{},634.0,,
16197,202312,"{288: 22675.04, 0: 2076474.704, 699: 2053799.663}","{288: 22296.0, 0: 3864296.0, 699: 3842000.0}",768.0,,
16198,202312,{},{},174.0,,


In [478]:
t = result[result['mirrorPrimaryValue'].isna() == True]

In [479]:
p = t.period

In [512]:
_data = pd.DataFrame({'period': [], 'reporterCode': [], 'mirrorPrimaryValue': [], 'mirrorNetWgt': []})

In [481]:
pc = t[t.period.isin(p)].reporterCode

In [469]:
response.json()

{'statusCode': 403,
 'message': 'Out of call volume quota. Quota will be replenished in 20:01:35.'}

In [513]:
period = ','.join(list(p.unique()))

pc = t[t.period.isin(p)].reporterCode

for i, partnerCode in enumerate(pc):
    time.sleep(10)

    params['period'] = period
    params['partnerCode'] = int(partnerCode)
    print(partnerCode)

    response = requests.get(URL, params, timeout=120)

    try:
        response_data = response.json()['data']
        df = pd.json_normalize(response_data)        
    except Exception as exc:        
        print(exc)
        time.sleep(10)
        df = pull_data(params)
        
    periods = period.split(',')
    
    for period_ in periods:
        # try:
        filtered_df = df[df['period'] == period_]
        
        reporters = list(filtered_df['reporterCode'])
        m_values = list(filtered_df['primaryValue'])
        m_wgts = list(filtered_df['netWgt'])
    
        m_primaryValues = {country: value for country, value in zip(reporters, m_values)}
        m_netWgts = {country: weight for country, weight in zip(reporters, m_wgts)}
    
        row = pd.DataFrame({'period': [period_], 'reporterCode': [partnerCode],
                                    'mirrorPrimaryValue': [m_primaryValues], 'mirrorNetWgt': [m_netWgts]})
    
        _data = pd.concat([_data, row])                   

887.0
854.0
882.0
887.0
887.0
882.0
887.0
4.0
887.0
882.0
887.0
4.0
887.0
854.0


In [514]:
_data

Unnamed: 0,period,reporterCode,mirrorPrimaryValue,mirrorNetWgt
0,201504,887.0,{},{}
0,201505,887.0,{},{}
0,201506,887.0,{},{}
0,201507,887.0,"{276: 329.88, 528: 73.673}","{276: 22.0, 528: 111.4}"
0,201509,887.0,{},{}
...,...,...,...,...
0,201507,854.0,"{384: 30883.879, 528: 12.096}","{384: 41141.0, 528: 19.276}"
0,201509,854.0,"{251: 1514.835, 710: 37794.016}","{251: 4.0, 710: 16723.81}"
0,201510,854.0,"{384: 47003.704, 710: 943.778}","{384: 89934.0, 710: 9.0}"
0,201511,854.0,"{251: 1912.082, 528: 77.299}","{251: 724.0, 528: 29.0}"


In [528]:
result = pd.concat([result, _data])

In [535]:
result = result[result.mirrorPrimaryValue.isna() != True]

In [537]:
result.drop_duplicates(subset=['period', 'reporterCode'], inplace=True)

In [539]:
result = result.sort_values(by='period').reset_index(drop=True)

In [541]:
result

Unnamed: 0,period,reporterCode,mirrorPrimaryValue,mirrorNetWgt,primaryValue,netWgt
0,201501,384.0,"{56: 18194.814, 251: 3404.953, 528: 42661.853,...","{56: 17139.0, 251: 211.0, 528: 32049.938, 710:...","{854: 1254417.242, 466: 730088.972, 120: 30223...","{854: 1786840.0, 466: 1054590.0, 120: 6040940...."
1,201501,144.0,"{56: 27262.064, 528: 42.998, 702: 33026484.285...","{56: 20955.0, 528: 71.539, 702: 58642710.0, 84...",{},{}
2,201501,458.0,"{276: 12978.333, 752: 740.238, 757: 21.249, 76...","{276: 3803.0, 752: 1508.407, 757: 1.0, 764: 97...","{0: 273762837.328, 702: 104531806.378, 598: 25...","{0: 445768037.7, 702: 192705452.47, 598: 46541..."
3,201501,174.0,{},{},,
4,201501,634.0,"{56: 168121.321, 124: 2398.829, 251: 68278.023...","{56: 98840.0, 124: 616.0, 251: 21380.0, 276: 2...",,
...,...,...,...,...,...,...
16195,202312,258.0,{554: 936.297},{554: 674.816},,
16196,202312,762.0,{710: 98.82},{710: 12.0},{},{}
16197,202312,764.0,"{757: 610.305, 276: 113446.262, 826: 12959.105...","{757: 15.0, 276: 25711.0, 826: 827.0, 56: 3.68...",{},{}
16198,202312,702.0,"{233: 63440.85, 528: 93807633.806, 516: 1.027,...","{233: 119752.0, 528: 90932700.0, 516: 0.99, 27...",{},{}


In [543]:
t = result[result['primaryValue'].isna() == True]

In [566]:
p = t.period.unique()

In [553]:
p1 = p[:12]
p2 = p[12:24]
p3 = p[24:36]
p4 = p[36:48]

In [572]:
periods = list(map(lambda k: ','.join(k),[p1, p2, p3, p4]))

In [558]:
typeCode = 'C' # Товары
freqCode = 'M' # Данные по месяцам
clCode = 'HS'  # На рандом

URL = f'https://comtradeapi.un.org/data/v1/get/{typeCode}/{freqCode}/{clCode}' 

params = {
    'subscription-key': PRIMARY_KEY, # Subscription key 
    'flowCode':         'X',         # Export
    'cmdCode':          hs_code,     #
    'reporterCode':     ','.join(map(str, CountryID))
}

In [559]:
_data = pd.DataFrame({'period': [], 'reporterCode': [], 'primaryValue': [], 'netWgt': []})

In [573]:
periods

['201501,201502,201503,201504,201505,201506,201507,201508,201509,201510,201511,201512',
 '201701,201702,201703,201704,201705,201706,201707,201708,201709,201710,201711,201712',
 '202001,202002,202003,202004,202005,202006,202007,202008,202009,202010,202011,202012',
 '202301,202302,202303,202304,202305,202306,202307,202308,202309,202310,202311,202312']

In [575]:
for period in periods:
    print(f'{period = }')
    params['period'] = period
    response = requests.get(URL, params)    
    response_data = response.json()['data']
    df = pd.json_normalize(response_data)

    p = df['period'].unique()
    pc = t[t.period.isin(p)].reporterCode
    for period_ in p:
        for reporterCode in pc:
            filtered_df = df[(df['reporterCode'] == reporterCode) & (df['period'] == period_)]
            
            partners = list(filtered_df['partnerCode'])
            wgts = list(filtered_df['netWgt'])
            values = list(filtered_df['primaryValue'])
    
            primaryValues = {country: value for country, value in zip(partners, values)}
            netWgts = {country: weight for country, weight in zip(partners, wgts)}
    
            row = pd.DataFrame({'period': [period_], 'reporterCode': [reporterCode], 'primaryValue': [primaryValues], 'netWgt': [netWgts]})
            _data = pd.concat([_data, row])

                                                          

period = '201501,201502,201503,201504,201505,201506,201507,201508,201509,201510,201511,201512'
period = '201701,201702,201703,201704,201705,201706,201707,201708,201709,201710,201711,201712'
period = '202001,202002,202003,202004,202005,202006,202007,202008,202009,202010,202011,202012'
period = '202301,202302,202303,202304,202305,202306,202307,202308,202309,202310,202311,202312'


64.67%\partnerCode = '31'

65.33%\partnerCode = '275'

In [576]:
_data

Unnamed: 0,period,reporterCode,primaryValue,netWgt
0,201501,174.0,"{0: 4.717, 450: 4.717}","{0: 9.0, 450: 9.0}"
0,201501,634.0,{},{}
0,201501,8.0,{},{}
0,201501,643.0,"{899: 86726.0, 757: 652745974.0, 51: 3239270.0...","{899: 153700.0, 757: 1221529642.0, 51: 5466335..."
0,201501,20.0,{},{}
...,...,...,...,...
0,202312,860.0,{},{}
0,202312,132.0,{},{}
0,202312,52.0,"{0: 36626.5, 899: 36626.5}","{0: 13191.0, 899: 13191.0}"
0,202312,112.0,{},{}


In [584]:
result.to_csv('export and mirrored export top 150 2015-2023.csv')

In [275]:
result = data.merge(mirror_data, on=['period', 'reporterCode'], how='outer')

In [276]:
result

Unnamed: 0,period,primaryValue,netWgt,reporterCode,mirrorPrimaryValue,mirrorNetWgt
0,201501,"{124: 456757863.0, 484: 792614010.0, 826: 6433...","{124: 0.0, 484: 0.0, 826: 0.0, 156: 0.0, 392: ...",842.0,"{222: 1135052.24, 233: 5822948.889, 251: 64511...","{222: 1304660.0, 233: 12276303.0, 251: 1440300..."
1,201501,{},{},784.0,"{512: 251555.244, 528: 18707.486, 579: 20577.6...","{512: 191292.0, 528: 13617.446, 579: 4056.0, 6..."
2,201501,"{76: 5187.0, 152: 68039.0, 170: 2405.0, 68: 11...","{76: 7953.421, 152: 104326.747, 170: 3687.677,...",484.0,"{528: 69062607.08, 604: 1340.14, 826: 23445.69...","{528: 108029300.0, 604: 2320.46, 826: 9944.0, ..."
3,201501,"{360: 464975156.303, 16: 995951.988, 258: 4610...","{360: 916569610.0, 16: 1695870.0, 258: 9127030...",702.0,"{246: 1.162, 251: 18357.694, 699: 145900347.54...","{246: 2.0, 251: 19722.0, 699: 301941000.0, 710..."
4,201501,"{36: 126368824.0, 152: 36192.0, 170: 65328.0, ...","{36: 234896000.0, 152: 24960.0, 170: 49920.0, ...",410.0,"{380: 30356495.896, 392: 16612528.746, 251: 12...","{380: 80033657.0, 392: 29793457.0, 251: 2480.0..."
...,...,...,...,...,...,...
16195,202312,{},{},400.0,,
16196,202312,{},{},634.0,,
16197,202312,"{288: 22675.04, 0: 2076474.704, 699: 2053799.663}","{288: 22296.0, 0: 3864296.0, 699: 3842000.0}",768.0,,
16198,202312,{},{},174.0,,


year = 2021
	0.00%\partnerCode = 842
	1.41%\partnerCode = 784
	2.82%\partnerCode = 484
	4.23%\partnerCode = 702
	5.63%\partnerCode = 410
	7.04%\partnerCode = 528
	8.45%\partnerCode = 392
	9.86%\partnerCode = 360
	11.27%\partnerCode = 124
	12.68%\partnerCode = 458
	14.08%\partnerCode = 276
	15.49%\partnerCode = 818
	16.90%\partnerCode = 76
	18.31%\partnerCode = 566
	19.72%\partnerCode = 710
	21.13%\partnerCode = 156
	22.54%\partnerCode = 56
	23.94%\partnerCode = 586
	25.35%\partnerCode = 608
	26.76%\partnerCode = 764
	28.17%\partnerCode = 36
	29.58%\partnerCode = 682
	30.99%\partnerCode = 826
	32.39%\partnerCode = 251
	33.80%\partnerCode = 170
	35.21%\partnerCode = 214
	36.62%\partnerCode = 24
	38.03%\partnerCode = 699
	39.44%\partnerCode = 434
	40.85%\partnerCode = 704
	42.25%\partnerCode = 422
	43.66%\partnerCode = 104
	45.07%\partnerCode = 380
	46.48%\partnerCode = 591
	47.89%\partnerCode = 757
	49.30%\partnerCode = 320
	50.70%\partnerCode = 752
	52.11%\partnerCode = 724
	53.52%\partnerCode = 804
	54.93%\partnerCode = 404
	56.34%\partnerCode = 516
	57.75%\partnerCode = 188
	59.15%\partnerCode = 792
	60.56%\partnerCode = 72
	61.97%\partnerCode = 554
	63.38%\partnerCode = 144
	64.79%\partnerCode = 246
	66.20%\partnerCode = 116
	67.61%\partnerCode = 887
	69.01%\partnerCode = 834
	70.42%\partnerCode = 68
	71.83%\partnerCode = 348
	73.24%\partnerCode = 340
	74.65%\partnerCode = 604
	76.06%\partnerCode = 196
	77.46%\partnerCode = 376
	78.87%\partnerCode = 800
	80.28%\partnerCode = 242
	81.69%\partnerCode = 579
	83.10%\partnerCode = 120
	84.51%\partnerCode = 222
	85.92%\partnerCode = 788
	87.32%\partnerCode = 384
	88.73%\partnerCode = 344
	90.14%\partnerCode = 504
	91.55%\partnerCode = 600
	92.96%\partnerCode = 40
	94.37%\partnerCode = 780
	95.77%\partnerCode = 203

# Обьединяем данные

# Заполняем пропуски

705, 643

In [99]:
data = pd.DataFrame({'period': [], 'reporterCode': [], 'primaryValue': [], 'netWgt': []})

In [107]:
for i, year in enumerate([2021]):
    print(f'{i / 8 * 100:0.2f}%\t{year = }')
    
    period = ','.join(str(year) + str(month) for month in months)

    params['period'] = period
    
    df = pull_data(params)

    periods = df['period'].unique()

    for period_ in periods:
        for reporterCode in CountryID:
            filtered_df = df[(df['reporterCode'] == reporterCode) & (df['period'] == period_)]
            
            partners = list(filtered_df['partnerCode'])
            wgts = list(filtered_df['netWgt'])
            values = list(filtered_df['primaryValue'])
    
            primaryValues = {country: value for country, value in zip(partners, values)}
            netWgts = {country: weight for country, weight in zip(partners, wgts)}
    
            row = pd.DataFrame({'period': [period_], 'cmdCode': [reporterCode], 'primaryValue': [primaryValues], 'netWgt': [netWgts]})
            data = pd.concat([data, row])
                                                              

0.00%	year = 2021


In [38]:
result = data.merge(mirror_data, on=['period', 'reporterCode'], how='outer')

In [None]:
result.to_pickle('Import and mirrored import 2015-2021.pkl')

In [40]:
result.to_pickle('Import and mirrored import 2022-2023.pkl')

In [109]:
df1 = pd.read_pickle('Import and mirrored import 2022-2023.pkl')

In [110]:
df2 = pd.read_pickle('Import and mirrored import 2015-2021.pkl')

In [111]:
df2 = df2.dropna(subset=['mirrorPrimaryValue'])

In [112]:
df1 = df1.dropna(subset=['mirrorPrimaryValue'])

In [115]:
df = pd.concat([df2, df1])

In [116]:
df.to_pickle('Import and mirrored import all countries 2015-2023.pkl')

In [117]:
df.to_csv('Import and mirrored import all countries 2015-2023.csv')

In [59]:
result.to_pickle('Import and mirrored import 2015-2023.pkl')

In [60]:
result.to_csv('Import and mirrored import 2015-2023.csv')

In [79]:
7692 - 7583

109