Let's find a name of our file

In [1]:
import os
os.listdir()

['funnel.csv',
 'H5.ipynb',
 'purchases.txt',
 'purchase_log.txt',
 'Read-write and pip.ipynb',
 'visit_log.csv']

Open it

In [2]:
with open('purchase_log.txt', 'r') as main_read_file:
    for i, line in enumerate(main_read_file):
        line = line.strip()
        print(i, line)
        if i > 9: break

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 285: character maps to <undefined>

We must understand an encoding of a file

In [3]:
import chardet

with open('purchase_log.txt', 'rb') as main_read_file:
    temp_container = bytes()
    for i, line in enumerate(main_read_file):
        temp_container += line
        if i > 9: break
    charmap = bytes(temp_container)
    print(chardet.detect(charmap)['encoding'])

utf-8


So it is a utf-8. Let's check that

In [4]:
with open('purchase_log.txt', 'r', encoding = 'utf-8') as main_read_file:
    for i, line in enumerate(main_read_file):
        line = line.strip()
        print(i, line)
        if i > 9: break

0 {"user_id": "user_id", "category": "category"}
1 {"user_id": "1840e0b9d4", "category": "Продукты"}
2 {"user_id": "4e4f90fcfb", "category": "Электроника"}
3 {"user_id": "afea8d72fc", "category": "Электроника"}
4 {"user_id": "373a6055fe", "category": "Бытовая техника"}
5 {"user_id": "9b2ab046f3", "category": "Электроника"}
6 {"user_id": "9f39d307c3", "category": "Электроника"}
7 {"user_id": "44edeffc91", "category": "Продукты"}
8 {"user_id": "704474fa2d", "category": "Продукты"}
9 {"user_id": "1de31be403", "category": "Бытовая техника"}
10 {"user_id": "b71f36a5e4", "category": "Продукты"}


Works just fine. Our task is:

Переведите содержимое файла purchase_log.txt в словарь purchases вида:
{‘1840e0b9d4’: ‘Продукты’, …}

In [5]:
import json

with open('purchase_log.txt', 'r', encoding = 'utf-8') as main_read_file:
    purchases = {}
    for i, line in enumerate(main_read_file):
        if i == 0: continue
        line = line.strip()
        temp_container = json.loads(line)
        purchases[temp_container['user_id']] = temp_container['category']
        print(purchases)
        if i>5: break

{'1840e0b9d4': 'Продукты'}
{'1840e0b9d4': 'Продукты', '4e4f90fcfb': 'Электроника'}
{'1840e0b9d4': 'Продукты', '4e4f90fcfb': 'Электроника', 'afea8d72fc': 'Электроника'}
{'1840e0b9d4': 'Продукты', '4e4f90fcfb': 'Электроника', 'afea8d72fc': 'Электроника', '373a6055fe': 'Бытовая техника'}
{'1840e0b9d4': 'Продукты', '4e4f90fcfb': 'Электроника', 'afea8d72fc': 'Электроника', '373a6055fe': 'Бытовая техника', '9b2ab046f3': 'Электроника'}
{'1840e0b9d4': 'Продукты', '4e4f90fcfb': 'Электроника', 'afea8d72fc': 'Электроника', '373a6055fe': 'Бытовая техника', '9b2ab046f3': 'Электроника', '9f39d307c3': 'Электроника'}


We can see that everything works. Let's write that into a file

I did that with only 5 lines because my VSCode crashes if i am trying to print full content of a file. It is a one very big line so...

In [6]:
# Write
with open('purchase_log.txt', 'r', encoding = 'utf-8') as main_read_file:
    purchases = {}
    for i, line in enumerate(main_read_file):
        if i == 0: continue
        line = line.strip()
        temp_container = json.loads(line)
        purchases[temp_container['user_id']] = temp_container['category']
        if i>5: break
    with open('purchases.txt', 'w', encoding = 'utf-8') as write_file:
            write_file.write(json.dumps(purchases))

# Read our new file
with open('purchases.txt', 'r', encoding = 'utf-8') as purchases:
            dict_ = json.loads(purchases.read())
            print(dict_)

{'1840e0b9d4': 'Продукты', '4e4f90fcfb': 'Электроника', 'afea8d72fc': 'Электроника', '373a6055fe': 'Бытовая техника', '9b2ab046f3': 'Электроника', '9f39d307c3': 'Электроника'}


If we want to do it in batch, here's how:

In [7]:
# Write
with open('purchase_log.txt', 'r', encoding = 'utf-8') as main_read_file:
    purchases = {}
    temp_container = main_read_file.read().split('\n')
    for i, line in enumerate(temp_container):
        if i == 0: continue
        line = json.loads(line)
        purchases[line['user_id']] = line['category']
        if i>5: break
    with open('purchases.txt', 'w', encoding = 'utf-8') as write_file:
            write_file.write(json.dumps(purchases))

# Read our new file
with open('purchases.txt', 'r', encoding = 'utf-8') as purchases:
            dict_ = json.loads(purchases.read())
            print(dict_)

{'1840e0b9d4': 'Продукты', '4e4f90fcfb': 'Электроника', 'afea8d72fc': 'Электроника', '373a6055fe': 'Бытовая техника', '9b2ab046f3': 'Электроника', '9f39d307c3': 'Электроника'}


So, everything is fine. Writing it all in a file

In [8]:
with open('purchase_log.txt', 'r', encoding = 'utf-8') as main_read_file:
    purchases = {}
    for i, line in enumerate(main_read_file):
        if i == 0: continue
        line = line.strip()
        temp_container = json.loads(line)
        purchases[temp_container['user_id']] = temp_container['category']
    with open('purchases.txt', 'w', encoding = 'utf-8') as write_file:
            write_file.write(json.dumps(purchases))

Second task:

Для каждого user_id в файле visit_log.csv определите третий столбец с категорией покупки, если покупка была, сам файл visit_log.csv изменять не надо. Запишите в файл funnel.csv визиты из файла visit_log.csv, в которых были покупки с указанием категории.

In [9]:
os.listdir()

['funnel.csv',
 'H5.ipynb',
 'purchases.txt',
 'purchase_log.txt',
 'Read-write and pip.ipynb',
 'visit_log.csv']

In [10]:
with open('visit_log.csv', 'r') as csv_table:
    for i, line in enumerate(csv_table):
        print(i, line)
        if i>2: break

0 user_id,source

1 6450655ae8,other

2 b4ea53e670,other

3 0e37347152,other



In [11]:
with open('purchases.txt', 'r', encoding = 'utf-8') as user_sales_file:
    user_sales = json.loads(user_sales_file.read())
    with open('visit_log.csv', 'r') as csv_table:
        with open('funnel.csv', 'w', encoding='utf-8') as write_csv_table:
            for i, line in enumerate(csv_table):
                line = line.strip('\n')
                user_id, *_ = line.split(',')
                if i == 0:
                    write_line = line + ',category\n'
                    write_csv_table.write(write_line)
                elif user_id in user_sales:
                    write_line = line + ',' + user_sales[user_id] + '\n'
                    write_csv_table.write(write_line)

with open('funnel.csv', 'r', encoding='utf-8') as test_file:
    for i, line in enumerate(test_file):
        print(i, line)
        if i>2: break

0 user_id,source,category

1 1840e0b9d4,other,Продукты

2 4e4f90fcfb,context,Электроника

3 afea8d72fc,other,Электроника

