# Pandas

Материалы:
* Макрушин С.В. "Лекция 2: Библиотека Pandas"
* https://pandas.pydata.org/docs/user_guide/index.html#
* https://pandas.pydata.org/docs/reference/index.html
* Уэс Маккини. Python и анализ данных

## Задачи для совместного разбора

In [1]:
import pandas as pd

1. Загрузите данные из файла `sp500hst.txt` и обозначьте столбцы в соответствии с содержимым: `"date", "ticker", "open", "high", "low", "close", "volume"`.

In [2]:
headers_list = ["date", "ticker", "open", "high","low", "close", "volume"]
df = pd.read_csv("./data/sp500hst.txt", sep=',',names=headers_list,  header=None)
print(df)

            date ticker   open     high     low  close  volume
0       20090821      A  25.60  25.6100  25.220  25.55   34758
1       20090824      A  25.64  25.7400  25.330  25.50   22247
2       20090825      A  25.50  25.7000  25.225  25.34   30891
3       20090826      A  25.32  25.6425  25.145  25.48   33334
4       20090827      A  25.50  25.5700  25.230  25.54   70176
...          ...    ...    ...      ...     ...    ...     ...
122569  20100813    ZMH  51.72  51.9000  51.380  51.44   14561
122570  20100816    ZMH  51.13  51.4700  50.600  51.00   13489
122571  20100817    ZMH  51.14  51.6000  50.890  51.21   20498
122572  20100819    ZMH  51.63  51.6300  50.170  50.22   18259
122573  20100820    ZMH  50.03  50.5500  49.480  49.82   17792

[122574 rows x 7 columns]


2. Рассчитайте среднее значение показателей для каждого из столбцов c номерами 3-6.

In [3]:
df.iloc[:, 3:6].mean()

high     43.102243
low      42.054464
close    42.601865
dtype: float64

3. Добавьте столбец, содержащий только число месяца, к которому относится дата.

In [4]:
pd.to_datetime(df['date'],format="%Y%m%d").dt.month.head()

0    8
1    8
2    8
3    8
4    8
Name: date, dtype: int64

4. Рассчитайте суммарный объем торгов для для одинаковых значений тикеров.

In [5]:
df['ticker'].str.lower().head()
df['open'].mean()

42.59545765904678

In [6]:
df.groupby('ticker')['open'].mean()

ticker
A        30.234857
AA       13.086959
AAPL    221.342427
ABC      27.432122
ABT      50.996776
           ...    
XTO      44.640338
YHOO     15.854774
YUM      37.456898
ZION     19.694057
ZMH      56.220980
Name: open, Length: 524, dtype: float64

5. Загрузите данные из файла sp500hst.txt и обозначьте столбцы в соответствии с содержимым: "date", "ticker", "open", "high", "low", "close", "volume". Добавьте столбец с расшифровкой названия тикера, используя данные из файла `sp_data2.csv` . В случае нехватки данных об именах тикеров корректно обработать их.

In [7]:
headers = ["ticker","company", "percent"]
sp = pd.read_csv("./data/sp_data2.csv", sep=";",names=headers)
print(sp)
sp.head()

    ticker          company percent
0     AAPL            Apple    3.6%
1     AMZN       Amazon.com    3.2%
2    GOOGL         Alphabet    3.1%
3     GOOG         Alphabet    3.1%
4     MSFT        Microsoft    3.0%
..     ...              ...     ...
500    SCG            SCANA    0.0%
501    AIZ         Assurant    0.0%
502    AYI    Acuity Brands    0.0%
503    HRB        H&R Block    0.0%
504    RRC  Range Resources    0.0%

[505 rows x 3 columns]


Unnamed: 0,ticker,company,percent
0,AAPL,Apple,3.6%
1,AMZN,Amazon.com,3.2%
2,GOOGL,Alphabet,3.1%
3,GOOG,Alphabet,3.1%
4,MSFT,Microsoft,3.0%


In [8]:
pd.merge(df, sp, how='inner',left_on="ticker", right_on='ticker')

Unnamed: 0,date,ticker,open,high,low,close,volume,company,percent
0,20090821,A,25.60,25.6100,25.220,25.55,34758,Agilent Technologies,0.1%
1,20090824,A,25.64,25.7400,25.330,25.50,22247,Agilent Technologies,0.1%
2,20090825,A,25.50,25.7000,25.225,25.34,30891,Agilent Technologies,0.1%
3,20090826,A,25.32,25.6425,25.145,25.48,33334,Agilent Technologies,0.1%
4,20090827,A,25.50,25.5700,25.230,25.54,70176,Agilent Technologies,0.1%
...,...,...,...,...,...,...,...,...,...
82167,20100813,ZION,20.17,20.4300,19.840,19.89,25193,Zions Bancorp,0.0%
82168,20100816,ZION,19.81,19.9600,19.600,19.95,25914,Zions Bancorp,0.0%
82169,20100817,ZION,20.07,20.4700,19.830,20.31,31717,Zions Bancorp,0.0%
82170,20100819,ZION,19.83,20.0000,19.130,19.35,45935,Zions Bancorp,0.0%


## Лабораторная работа №2

### Базовые операции с `DataFrame`

1.1 В файлах `recipes_sample.csv` и `reviews_sample.csv` находится информация об рецептах блюд и отзывах на эти рецепты соответственно. Загрузите данные из файлов в виде `pd.DataFrame` с названиями `recipes` и `reviews`. Обратите внимание на корректное считывание столбца(ов) с индексами.

In [9]:
recipes = pd.read_csv("./data/recipes_sample.csv", sep=",", parse_dates=['submitted'])
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients
0,george s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0
1,healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,
2,i can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0
3,italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,
4,love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,
...,...,...,...,...,...,...,...,...
29995,zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0
29996,zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0
29997,zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,
29998,zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,


In [10]:
reviews = pd.read_csv("./data/reviews_sample.csv", sep=",", parse_dates=["date"])
reviews

Unnamed: 0.1,Unnamed: 0,user_id,recipe_id,date,rating,review
0,370476,21752,57993,2003-05-01,5,Last week whole sides of frozen salmon fillet ...
1,624300,431813,142201,2007-09-16,5,So simple and so tasty! I used a yellow capsi...
2,187037,400708,252013,2008-01-10,4,"Very nice breakfast HH, easy to make and yummy..."
3,706134,2001852463,404716,2017-12-11,5,These are a favorite for the holidays and so e...
4,312179,95810,129396,2008-03-14,5,Excellent soup! The tomato flavor is just gre...
...,...,...,...,...,...,...
126691,1013457,1270706,335534,2009-05-17,4,This recipe was great! I made it last night. I...
126692,158736,2282344,8701,2012-06-03,0,This recipe is outstanding. I followed the rec...
126693,1059834,689540,222001,2008-04-08,5,"Well, we were not a crowd but it was a fabulou..."
126694,453285,2000242659,354979,2015-06-02,5,I have been a steak eater and dedicated BBQ gr...


1.2 Для каждой из таблиц выведите основные параметры:
* количество точек данных (строк);
* количество столбцов;
* тип данных каждого столбца.

In [11]:
print(f"recipes: {recipes.shape[0]} {len(recipes.columns)}")
recipes.dtypes

recipes: 30000 8


name                      object
id                         int64
minutes                    int64
contributor_id             int64
submitted         datetime64[ns]
n_steps                  float64
description               object
n_ingredients            float64
dtype: object

In [12]:
print(f"reviews: {reviews.shape[0]} {len(reviews.columns)}")
reviews.dtypes

reviews: 126696 6


Unnamed: 0             int64
user_id                int64
recipe_id              int64
date          datetime64[ns]
rating                 int64
review                object
dtype: object

1.3 Исследуйте, в каких столбцах таблиц содержатся пропуски. Посчитайте долю строк, содержащих пропуски, в отношении к общему количеству строк.

In [13]:
all_count = recipes.shape[0]
without_none_count = recipes.dropna().shape[0]
none_count = all_count - without_none_count
print(f"{all_count}\n{without_none_count}\n{none_count}")

str(round(none_count*100/all_count,2))+"%"

30000
12946
17054


'56.85%'

In [14]:
recipes.isnull().sum()

name                  0
id                    0
minutes               0
contributor_id        0
submitted             0
n_steps           11190
description         623
n_ingredients      8880
dtype: int64

In [15]:
all_count = reviews.shape[0]
without_none_count = reviews.dropna().shape[0]
none_count = all_count - without_none_count
print(f"{all_count}\n{without_none_count}\n{none_count}")

str(round(none_count*100/all_count,2))+"%"

126696
126679
17


'0.01%'

In [16]:
reviews.isnull().sum()

Unnamed: 0     0
user_id        0
recipe_id      0
date           0
rating         0
review        17
dtype: int64

1.4 Рассчитайте среднее значение для каждого из числовых столбцов (где это имеет смысл).

In [17]:
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients
0,george s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0
1,healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,
2,i can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0
3,italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,
4,love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,
...,...,...,...,...,...,...,...,...
29995,zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0
29996,zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0
29997,zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,
29998,zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,


In [18]:
for column in ("minutes","n_ingredients","n_steps"):
    print(f"{column} {round(recipes[column].mean(),2)}")

minutes 123.36
n_ingredients 9.01
n_steps 9.81


In [19]:
reviews

Unnamed: 0.1,Unnamed: 0,user_id,recipe_id,date,rating,review
0,370476,21752,57993,2003-05-01,5,Last week whole sides of frozen salmon fillet ...
1,624300,431813,142201,2007-09-16,5,So simple and so tasty! I used a yellow capsi...
2,187037,400708,252013,2008-01-10,4,"Very nice breakfast HH, easy to make and yummy..."
3,706134,2001852463,404716,2017-12-11,5,These are a favorite for the holidays and so e...
4,312179,95810,129396,2008-03-14,5,Excellent soup! The tomato flavor is just gre...
...,...,...,...,...,...,...
126691,1013457,1270706,335534,2009-05-17,4,This recipe was great! I made it last night. I...
126692,158736,2282344,8701,2012-06-03,0,This recipe is outstanding. I followed the rec...
126693,1059834,689540,222001,2008-04-08,5,"Well, we were not a crowd but it was a fabulou..."
126694,453285,2000242659,354979,2015-06-02,5,I have been a steak eater and dedicated BBQ gr...


In [20]:
for column in ["rating"]:
    print(f"{column} {round(reviews[column].mean(),2)}")

rating 4.41


1.5 Создайте серию из 10 случайных названий рецептов.

In [21]:
result = recipes["name"].sample(10)
print(type(result))
result

<class 'pandas.core.series.Series'>


10945                               fiesta casserole
15855               lemon cooler  knock off  cookies
24827                sour cream bread  bread machine
4088                     bulgur and butternut squash
20278                           peanut butter snacks
17681                    milwaukee sweet tart supper
11919    german meatloaf  falscher hase   false hare
9268                       delicious breakfast toast
10852                              favorite cioppino
6397                              chive french toast
Name: name, dtype: object

1.6 Измените индекс в таблице `reviews`, пронумеровав строки, начиная с нуля.

In [22]:
reviews

Unnamed: 0.1,Unnamed: 0,user_id,recipe_id,date,rating,review
0,370476,21752,57993,2003-05-01,5,Last week whole sides of frozen salmon fillet ...
1,624300,431813,142201,2007-09-16,5,So simple and so tasty! I used a yellow capsi...
2,187037,400708,252013,2008-01-10,4,"Very nice breakfast HH, easy to make and yummy..."
3,706134,2001852463,404716,2017-12-11,5,These are a favorite for the holidays and so e...
4,312179,95810,129396,2008-03-14,5,Excellent soup! The tomato flavor is just gre...
...,...,...,...,...,...,...
126691,1013457,1270706,335534,2009-05-17,4,This recipe was great! I made it last night. I...
126692,158736,2282344,8701,2012-06-03,0,This recipe is outstanding. I followed the rec...
126693,1059834,689540,222001,2008-04-08,5,"Well, we were not a crowd but it was a fabulou..."
126694,453285,2000242659,354979,2015-06-02,5,I have been a steak eater and dedicated BBQ gr...


In [23]:
reviews.rename(columns={'Unnamed: 0': 'index'}, inplace=True)

In [24]:
reviews = reviews.set_index('index')
reviews

Unnamed: 0_level_0,user_id,recipe_id,date,rating,review
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
370476,21752,57993,2003-05-01,5,Last week whole sides of frozen salmon fillet ...
624300,431813,142201,2007-09-16,5,So simple and so tasty! I used a yellow capsi...
187037,400708,252013,2008-01-10,4,"Very nice breakfast HH, easy to make and yummy..."
706134,2001852463,404716,2017-12-11,5,These are a favorite for the holidays and so e...
312179,95810,129396,2008-03-14,5,Excellent soup! The tomato flavor is just gre...
...,...,...,...,...,...
1013457,1270706,335534,2009-05-17,4,This recipe was great! I made it last night. I...
158736,2282344,8701,2012-06-03,0,This recipe is outstanding. I followed the rec...
1059834,689540,222001,2008-04-08,5,"Well, we were not a crowd but it was a fabulou..."
453285,2000242659,354979,2015-06-02,5,I have been a steak eater and dedicated BBQ gr...


1.7 Выведите информацию о рецептах, время выполнения которых не больше 20 минут и кол-во ингредиентов в которых не больше 5.

In [25]:
condition = (recipes['minutes'] < 21) & (recipes['n_ingredients'] < 6)
recipes.loc[condition]

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients
28,quick biscuit bread,302399,20,213909,2008-05-06,11.0,this is a wonderful quick bread to make as an ...,5.0
60,peas fit for a king or queen,303944,20,213909,2008-05-16,,this recipe is so simple and the flavors are s...,5.0
90,hawaiian sunrise mimosa,100837,5,58104,2004-09-29,4.0,pineapple mimosa was changed to hawaiian sunri...,3.0
91,tasty dish s banana pudding in 2 minutes,286484,2,47892,2008-02-13,,"""mmmm, i love bananas!"" a --tasty dish-- origi...",4.0
94,1 minute meatballs,11361,13,4470,2001-09-03,,this is a real short cut for cooks in a hurry....,2.0
...,...,...,...,...,...,...,...,...
29873,zip and steam red potatoes with butter and garlic,304922,13,724218,2008-05-27,9.0,"i haven't tried this yet, but i am going to so...",5.0
29874,ziplock vanilla ice cream,74250,10,24386,2003-10-29,8.0,a fun thing for kids to do. may want to use mi...,3.0
29905,zucchini and corn with cheese,256177,15,305531,2007-09-29,4.0,from betty crocker fresh spring recipes. i lik...,5.0
29980,zucchini with jalapeno monterey jack,320622,10,305531,2008-08-20,3.0,simple and yummy!,3.0


### Работа с датами в `pandas`

In [26]:
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients
0,george s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0
1,healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,
2,i can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0
3,italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,
4,love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,
...,...,...,...,...,...,...,...,...
29995,zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0
29996,zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0
29997,zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,
29998,zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,


2.1 Преобразуйте столбец `submitted` из таблицы `recipes` в формат времени. Модифицируйте решение задачи 1.1 так, чтобы считать столбец сразу в нужном формате.

In [27]:
recipes.dtypes

name                      object
id                         int64
minutes                    int64
contributor_id             int64
submitted         datetime64[ns]
n_steps                  float64
description               object
n_ingredients            float64
dtype: object

2.2 Выведите информацию о рецептах, добавленных в датасет не позже 2010 года.

In [28]:
koshkas = recipes['submitted'].dt.year < 2011
recipes.loc[koshkas]

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients
0,george s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0
1,healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,
2,i can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0
3,italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,
4,love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,
...,...,...,...,...,...,...,...,...
29993,zuni caf zucchini pickles,316950,2895,62264,2008-07-31,,refrigerator pickles for some of the zucchini ...,8.0
29995,zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0
29996,zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0
29997,zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,


### Работа со строковыми данными в `pandas`

In [29]:
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients
0,george s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0
1,healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,
2,i can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0
3,italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,
4,love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,
...,...,...,...,...,...,...,...,...
29995,zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0
29996,zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0
29997,zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,
29998,zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,


3.1  Добавьте в таблицу `recipes` столбец `description_length`, в котором хранится длина описания рецепта из столбца `description`.

In [30]:
description_length = recipes["description"].str.len()
recipes["description_length"] = description_length
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients,description_length
0,george s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0,330.0
1,healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,,255.0
2,i can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0,39.0
3,italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,,154.0
4,love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,,587.0
...,...,...,...,...,...,...,...,...,...
29995,zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0,484.0
29996,zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0,286.0
29997,zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,,311.0
29998,zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,,648.0


3.2 Измените название каждого рецепта в таблице `recipes` таким образом, чтобы каждое слово в названии начиналось с прописной буквы.

In [31]:
recipes["name"] = recipes["name"].str.capitalize()
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients,description_length
0,George s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0,330.0
1,Healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,,255.0
2,I can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0,39.0
3,Italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,,154.0
4,Love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,,587.0
...,...,...,...,...,...,...,...,...,...
29995,Zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0,484.0
29996,Zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0,286.0
29997,Zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,,311.0
29998,Zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,,648.0


3.3 Добавьте в таблицу `recipes` столбец `name_word_count`, в котором хранится количество слов из названии рецепта (считайте, что слова в названии разделяются только пробелами).

In [32]:
recipes["name_word_count"] = recipes["name"].str.split().apply(len)
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients,description_length,name_word_count
0,George s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0,330.0,8
1,Healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,,255.0,5
2,I can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0,39.0,7
3,Italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,,154.0,3
4,Love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,,587.0,8
...,...,...,...,...,...,...,...,...,...,...
29995,Zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0,484.0,8
29996,Zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0,286.0,4
29997,Zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,,311.0,5
29998,Zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,,648.0,2


### Группировки таблиц `pd.DataFrame`

In [33]:
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients,description_length,name_word_count
0,George s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0,330.0,8
1,Healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,,255.0,5
2,I can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0,39.0,7
3,Italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,,154.0,3
4,Love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,,587.0,8
...,...,...,...,...,...,...,...,...,...,...
29995,Zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0,484.0,8
29996,Zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0,286.0,4
29997,Zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,,311.0,5
29998,Zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,,648.0,2


4.1 Посчитайте количество рецептов, представленных каждым из участников (`contributor_id`). Какой участник добавил максимальное кол-во рецептов?

In [34]:
n_by_state = recipes.groupby("contributor_id")["id"].count()
n_by_state

contributor_id
1530            5
1533          186
1534           50
1535           40
1538            8
             ... 
2001968497      2
2002059754      1
2002234079      1
2002234259      1
2002247884      1
Name: id, Length: 8404, dtype: int64

In [35]:
recipes.loc[recipes['contributor_id'] == 1530]

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients,description_length,name_word_count
2429,Basil parmesan biscuits,975,27,1530,1999-08-14,,perfect with pasta or other italian dishes,10.0,42.0,3
4627,Caramel apple milkshakes,128,25,1530,1999-09-12,4.0,,5.0,,3
6778,Chocolate tapioca pudding,224,50,1530,1999-08-14,,yummy!,,6.0,3
8906,Cucumber relish,671,40,1530,1999-09-10,,,10.0,,2
26182,Stuffed mozzarella,1155,10,1530,1999-08-28,4.0,,,,2


4.2 Посчитайте средний рейтинг к каждому из рецептов. Для скольких рецептов отсутствуют отзывы?

In [36]:
n_by_state = reviews.groupby("recipe_id")["rating"].mean()
n_by_state

recipe_id
48        1.000000
55        4.750000
66        4.944444
91        4.750000
94        5.000000
            ...   
536547    5.000000
536610    0.000000
536728    4.000000
536729    4.750000
536747    0.000000
Name: rating, Length: 28100, dtype: float64

In [37]:
reviews.loc[reviews["recipe_id"] == 55]

Unnamed: 0_level_0,user_id,recipe_id,date,rating,review
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
462145,165567,55,2006-03-31,5,I LOVED this recipe! I was looking for a guaca...
462147,851190,55,2010-05-23,5,I used Italian plum tomatoes for this as they ...
462144,53959,55,2006-01-12,4,I liked it. I was surprised since it didn't ha...
462146,1060485,55,2009-04-07,5,loved this! my family ate it all up! will deff...


4.3 Посчитайте количество рецептов с разбивкой по годам создания.

In [38]:
n_by_state = recipes.groupby(recipes["submitted"].dt.year)["name"].count()
n_by_state

submitted
1999     275
2000     104
2001     589
2002    2644
2003    2334
2004    2153
2005    3130
2006    3473
2007    4429
2008    4029
2009    2963
2010    1538
2011     922
2012     659
2013     490
2014     139
2015      42
2016      24
2017      39
2018      24
Name: name, dtype: int64

In [39]:
recipes[recipes["submitted"].dt.year == 2018]

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients,description_length,name_word_count
4932,Cauliflower ceviche,536547,45,2002234079,2018-07-30,15.0,a healthy ceviche - a perfect appetizer for pa...,8.0,106.0,2
5200,Cheesesteak stuffed onion rings,535783,60,33186,2018-05-10,10.0,surprise your family and friends with an onion...,11.0,110.0,4
8442,Creole watermelon feta salad,536729,10,1052873,2018-08-11,4.0,spicy watermelon salad. from tony chachere's s...,,50.0,4
9812,Easy bagel dogs weight watchers friendly,535120,95,700213,2018-02-07,22.0,this easy recipe is perfect if you’re watching...,,491.0,6
10978,Filet mignon,535584,130,33186,2018-04-11,,courtesy of chef kristen nguyen. must use high...,13.0,471.0,2
11614,Fruit loop treats,535104,15,2001968497,2018-02-06,,"fruity, slightly elevated, ridiculously easy a...",,104.0,3
12124,Gluten free vegemite,536728,2,1052873,2018-08-11,,gluten free vegemite-like stuff.,3.0,32.0,3
12300,Gotham rib steak,536098,142,33186,2018-06-26,13.0,recipe courtesy of old homestead steakhouse,4.0,43.0,3
13294,Hamburger potpie with homemade crust,536360,160,1801884905,2018-07-17,73.0,"the flaky, buttery crust is only the beginning...",,250.0,5
13380,Hawaiian cheesecake fruit salad,535713,15,219942,2018-04-27,22.0,"heavenly rich and fruity, a great summer fruit...",10.0,82.0,4


### Объединение таблиц `pd.DataFrame`

In [40]:
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients,description_length,name_word_count
0,George s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0,330.0,8
1,Healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,,255.0,5
2,I can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0,39.0,7
3,Italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,,154.0,3
4,Love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,,587.0,8
...,...,...,...,...,...,...,...,...,...,...
29995,Zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0,484.0,8
29996,Zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0,286.0,4
29997,Zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,,311.0,5
29998,Zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,,648.0,2


In [41]:
reviews

Unnamed: 0_level_0,user_id,recipe_id,date,rating,review
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
370476,21752,57993,2003-05-01,5,Last week whole sides of frozen salmon fillet ...
624300,431813,142201,2007-09-16,5,So simple and so tasty! I used a yellow capsi...
187037,400708,252013,2008-01-10,4,"Very nice breakfast HH, easy to make and yummy..."
706134,2001852463,404716,2017-12-11,5,These are a favorite for the holidays and so e...
312179,95810,129396,2008-03-14,5,Excellent soup! The tomato flavor is just gre...
...,...,...,...,...,...
1013457,1270706,335534,2009-05-17,4,This recipe was great! I made it last night. I...
158736,2282344,8701,2012-06-03,0,This recipe is outstanding. I followed the rec...
1059834,689540,222001,2008-04-08,5,"Well, we were not a crowd but it was a fabulou..."
453285,2000242659,354979,2015-06-02,5,I have been a steak eater and dedicated BBQ gr...


5.1 При помощи объединения таблиц, создайте `DataFrame`, состоящий из четырех столбцов: `id`, `name`, `user_id`, `rating`. Рецепты без отзывов должны отсутствовать в данной таблице.

In [42]:
new_df = pd.merge(recipes[["id","name"]], reviews[["user_id","rating","recipe_id"]], how='inner',left_on="id", right_on='recipe_id')
new_df = new_df.drop(labels="recipe_id",axis=1)
new_df.index.name='index'
new_df

Unnamed: 0_level_0,id,name,user_id,rating
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,44123,George s at the cove black bean soup,743566,5
1,44123,George s at the cove black bean soup,76503,5
2,44123,George s at the cove black bean soup,34206,5
3,67664,Healthy for them yogurt popsicles,494084,5
4,67664,Healthy for them yogurt popsicles,303445,5
...,...,...,...,...
126691,486161,Zydeco soup,305531,5
126692,486161,Zydeco soup,1271506,5
126693,486161,Zydeco soup,724631,5
126694,486161,Zydeco soup,133174,5


Подтвердите правильность работы вашего кода, выбрав рецепт, не имеющий отзывов, и выведя на экран строку из полученного `DataFrame`, содержащую информацию об этом отзыве.

In [43]:
reviews[reviews['review'].isna()].shape[0]

17

In [44]:
reviews[reviews['review'].isna()].index

Int64Index([  56957,  273481,   56955,  783599, 1089381,  266359, 1078013,
             841853, 1020267,   56966, 1096338,  633479,  270484,  303674,
             510473,  774321,   26786],
           dtype='int64', name='index')

In [45]:
reviews.loc[[56957]]

Unnamed: 0_level_0,user_id,recipe_id,date,rating,review
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
56957,2001567544,9054,2017-06-03,5,


In [46]:
new_df.loc[[56957]]

Unnamed: 0_level_0,id,name,user_id,rating
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
56957,41231,Ham and provolone pinwheels,35193,5


In [47]:
print(reviews["review"].shape[0])
print(reviews["review"].dropna().shape[0])
print(new_df.shape[0])

126696
126679
126696


5.2 При помощи объединения таблиц и группировок, создайте `DataFrame`, состоящий из трех столбцов: `recipe_id`, `name`, `review_count`. У рецептов, для которых отсутствуют отзывы, в соответствущем столбце должен быть указан 0. Подтвердите правильность работы вашего кода, выбрав рецепт, не имеющий отзывов, и выведя на экран строку из полученного `DataFrame`, содержащую информацию об этом отзыве.


In [48]:
other_new_df = recipes.merge(reviews, how='left', left_on='id', right_on='recipe_id')
other_new_df = recipes[['id', 'name']].merge(other_new_df['review'].groupby(other_new_df['name']).count(), how='left', on="name")
other_new_df = other_new_df.rename(columns={'id': 'recipe_id', 'review': 'review_count'})
other_new_df

Unnamed: 0,recipe_id,name,review_count
0,44123,George s at the cove black bean soup,3
1,67664,Healthy for them yogurt popsicles,8
2,38798,I can t believe it s spinach,3
3,35173,Italian gut busters,1
4,84797,Love is in the air beef fondue sauces,8
...,...,...,...
29995,267661,Zurie s holey rustic olive and cheddar bread,4
29996,386977,Zwetschgenkuchen bavarian plum cake,2
29997,103312,Zwiebelkuchen southwest german onion cake,6
29998,486161,Zydeco soup,6


5.3. Выясните, отзывы, добавленные в каком году, имеют наименьший средний рейтинг?

In [49]:
reviews['rating'].groupby(pd.DatetimeIndex(reviews['date']).year).mean().sort_values()

date
2017    3.353042
2000    3.384615
2018    3.504076
2016    3.912603
2015    4.047145
2014    4.110595
2001    4.134426
2013    4.274168
2011    4.302045
2012    4.341427
2010    4.454440
2002    4.481556
2008    4.484635
2009    4.524728
2003    4.526940
2004    4.546548
2007    4.547077
2005    4.563680
2006    4.603673
Name: rating, dtype: float64

### Сохранение таблиц `pd.DataFrame`

6.1 Отсортируйте таблицу в порядке убывания величины столбца `name_word_count` и сохраните результаты выполнения заданий 3.1-3.3 в csv файл. 

In [50]:
recipes.sort_values(by=['name_word_count'])
recipes.to_csv("./data/6.1.csv")
recipes

Unnamed: 0,name,id,minutes,contributor_id,submitted,n_steps,description,n_ingredients,description_length,name_word_count
0,George s at the cove black bean soup,44123,90,35193,2002-10-25,,an original recipe created by chef scott meska...,18.0,330.0,8
1,Healthy for them yogurt popsicles,67664,10,91970,2003-07-26,,my children and their friends ask for my homem...,,255.0,5
2,I can t believe it s spinach,38798,30,1533,2002-08-29,,"these were so go, it surprised even me.",8.0,39.0,7
3,Italian gut busters,35173,45,22724,2002-07-27,,my sister-in-law made these for us at a family...,,154.0,3
4,Love is in the air beef fondue sauces,84797,25,4470,2004-02-23,4.0,i think a fondue is a very romantic casual din...,,587.0,8
...,...,...,...,...,...,...,...,...,...,...
29995,Zurie s holey rustic olive and cheddar bread,267661,80,200862,2007-11-25,16.0,this is based on a french recipe but i changed...,10.0,484.0,8
29996,Zwetschgenkuchen bavarian plum cake,386977,240,177443,2009-08-24,,"this is a traditional fresh plum cake, thought...",11.0,286.0,4
29997,Zwiebelkuchen southwest german onion cake,103312,75,161745,2004-11-03,,this is a traditional late summer early fall s...,,311.0,5
29998,Zydeco soup,486161,60,227978,2012-08-29,,this is a delicious soup that i originally fou...,,648.0,2


6.2 Воспользовавшись `pd.ExcelWriter`, cохраните результаты 5.1 и 5.2 в файл: на лист с названием `Рецепты с оценками` сохраните результаты выполнения 5.1; на лист с названием `Количество отзывов по рецептам` сохраните результаты выполнения 5.2.

In [52]:
with pd.ExcelWriter(r"./data/6.2.xlsx") as writer:
    new_df.to_excel(writer, sheet_name="Рецепты с оценками")
    other_new_df.to_excel(writer, sheet_name="Количество отзывов по рецептам")