## Processing the fishing store's customers survey (using the Pandas library)

Файл `Fishing.csv` содержит результаты опроса о рыбалке: респонденты, заполняя опросник, подробно описывали свою недавнюю рыбалку.

**Описание переменных в датафрейме:**


* `mode`: выбранный тип рыбалки: на берегу (`beach`), на пирсе (`pier`), в своей лодке (`boat`) и в арендованной лодке (`charter`);

* `price`: стоимость выбранного типа рыбалки;

* `catch`: коэффициент улова при выбранном типе рыбалки;

* `pbeach`: стоимость рыбалки на берегу;

* `ppier`: стоимость рыбалки на пирсе;

* `pboat`: стоимость рыбалки на своей лодке;

* `pcharter`: стоимость рыбалки на арендованной лодке;

* `cbeach`: коэффициент улова на рыбалке на берегу;

* `cpier`: коэффициент улова на рыбалке на пирсе;

* `cboat`: коэффициент улова на рыбалке на своей лодке;

* `ccharter`: коэффициент улова на рыбалке на арендованной лодке;

* `income`: доход в месяц.

Подробнее об опросе и исследовании можно почитать в [статье](https://core.ac.uk/download/pdf/38934845.pdf) J.Herriges, C.Kling *"Nonlinear Income Effects in Random Utility Models"* (1999).

### Задание 1

Загрузить таблицу из файла `Fishing.csv` и сохранить её в датафрейм `dat`.
Вывести на экран первые 8 строк загруженного датафрейма.

In [None]:
import pandas as pd
import numpy as np

In [None]:
### Для загрузки файла массива с компьютера на Colab (предварительно его можно закачать по ссылке:
### https://drive.google.com/file/d/1LszHqB1IHJN9T9jDXuHiRQ9svFVQf3tz/view):
# from google.colab import files
# files.upload()

In [None]:
dat = pd.read_csv("Fishing.csv")

Первые 8 строк загруженного датасета:

In [None]:
dat.head(8)

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,mode,price,catch,pbeach,ppier,pboat,pcharter,cbeach,cpier,cboat,ccharter,income
0,0,1,charter,182.93,0.5391,157.93,157.93,157.93,182.93,0.0678,0.0503,0.2601,0.5391,7083.3317
1,1,2,charter,34.534,0.4671,15.114,15.114,10.534,34.534,0.1049,0.0451,0.1574,0.4671,1249.9998
2,2,3,boat,24.334,0.2413,161.874,161.874,24.334,59.334,0.5333,0.4522,0.2413,1.0266,3749.9999
3,3,4,pier,15.134,0.0789,15.134,15.134,55.93,84.93,0.0678,0.0789,0.1643,0.5391,2083.3332
4,4,5,boat,41.514,0.1082,106.93,106.93,41.514,71.014,0.0678,0.0503,0.1082,0.324,4583.332
5,5,6,charter,63.934,0.3975,192.474,192.474,28.934,63.934,0.5333,0.4522,0.1665,0.3975,4583.332
6,6,7,beach,51.934,0.0678,51.934,51.934,191.93,220.93,0.0678,0.0789,0.1643,0.5391,8750.001
7,7,8,charter,56.714,0.0209,15.134,15.134,21.714,56.714,0.0678,0.0789,0.0102,0.0209,2083.3332


In [None]:
dat['log_income'] = dat['income'].apply(np.log)

Датесет с новым столбцом `log_income`, содержащим натуральный логарифм доходов респондентов:

In [None]:
dat.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,mode,price,catch,pbeach,ppier,pboat,pcharter,cbeach,cpier,cboat,ccharter,income,log_income
0,0,1,charter,182.93,0.5391,157.93,157.93,157.93,182.93,0.0678,0.0503,0.2601,0.5391,7083.3317,8.8655
1,1,2,charter,34.534,0.4671,15.114,15.114,10.534,34.534,0.1049,0.0451,0.1574,0.4671,1249.9998,7.130899
2,2,3,boat,24.334,0.2413,161.874,161.874,24.334,59.334,0.5333,0.4522,0.2413,1.0266,3749.9999,8.229511
3,3,4,pier,15.134,0.0789,15.134,15.134,55.93,84.93,0.0678,0.0789,0.1643,0.5391,2083.3332,7.641724
4,4,5,boat,41.514,0.1082,106.93,106.93,41.514,71.014,0.0678,0.0503,0.1082,0.324,4583.332,8.430182


In [None]:
dat['pdiff'] = abs(dat['price'] - dat['pbeach'])

Датафрейм с новым столбцом `pdiff`, в котором для каждого респондента вычислено абсолютное значение отклонения `price` от `pbeach`.

In [None]:
dat.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,mode,price,catch,pbeach,ppier,pboat,pcharter,cbeach,cpier,cboat,ccharter,income,log_income,pdiff
0,0,1,charter,182.93,0.5391,157.93,157.93,157.93,182.93,0.0678,0.0503,0.2601,0.5391,7083.3317,8.8655,25.0
1,1,2,charter,34.534,0.4671,15.114,15.114,10.534,34.534,0.1049,0.0451,0.1574,0.4671,1249.9998,7.130899,19.42
2,2,3,boat,24.334,0.2413,161.874,161.874,24.334,59.334,0.5333,0.4522,0.2413,1.0266,3749.9999,8.229511,137.54
3,3,4,pier,15.134,0.0789,15.134,15.134,55.93,84.93,0.0678,0.0789,0.1643,0.5391,2083.3332,7.641724,0.0
4,4,5,boat,41.514,0.1082,106.93,106.93,41.514,71.014,0.0678,0.0503,0.1082,0.324,4583.332,8.430182,65.416


Средние цены (`price`), которые респонденты заплатили за каждый тип рыбалки:

In [None]:
dat.groupby('mode').agg('mean')['price']

mode
beach      35.699493
boat       41.606813
charter    75.096942
pier       30.571326
Name: price, dtype: float64

Разница между медианным и средним значениями цены (`price`), которую респонденты заплатили за каждый тип рыбалки:

In [None]:
f = lambda x: x.median() - x.mean()
dat.groupby('mode').agg(f)['price']

mode
beach     -16.391493
boat      -17.004813
charter   -18.226942
pier      -13.942326
Name: price, dtype: float64

Датафрейм со строками, отсортированными в соответствии со значениями income в порядке убывания:

In [None]:
dat.sort_values('income', ascending=False, inplace=True)
dat

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,mode,price,catch,pbeach,ppier,pboat,pcharter,cbeach,cpier,cboat,ccharter,income,log_income,pdiff
524,524,525,charter,240.586,0.5391,167.374,167.374,211.586,240.586,0.0678,0.0789,0.1643,0.5391,12499.99800,9.433484,73.212
656,656,657,boat,37.896,0.0238,202.112,202.112,37.896,61.896,0.1049,0.0451,0.0238,0.0468,12499.99800,9.433484,164.216
1085,1085,1086,boat,37.896,0.7369,227.376,227.376,37.896,62.896,0.2537,0.1498,0.7369,2.3101,12499.99800,9.433484,189.480
1118,1118,1119,boat,15.790,0.7369,180.006,180.006,15.790,40.790,0.2537,0.1498,0.7369,2.3101,12499.99800,9.433484,164.216
1145,1145,1146,charter,40.790,2.3101,180.006,180.006,15.790,40.790,0.2537,0.1498,0.7369,2.3101,12499.99800,9.433484,139.216
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
517,517,518,charter,48.674,0.0058,53.406,53.406,13.674,48.674,0.5333,0.4522,0.0156,0.0058,416.66668,6.032287,4.732
978,978,979,beach,7.998,0.5333,7.998,7.998,51.858,86.858,0.5333,0.4522,0.2413,1.0266,416.66668,6.032287,0.000
1180,1180,1181,beach,36.636,0.5333,36.636,36.636,61.146,96.146,0.5333,0.4522,0.1665,0.3975,416.66668,6.032287,0.000
878,878,879,boat,3.870,0.7369,22.446,22.446,3.870,28.870,0.2537,0.1498,0.7369,2.3101,416.66668,6.032287,18.576


Датафрейм, в котором строки отсортированы в соответствии со значениями `price` и `income` в порядке возрастания:

In [None]:
dat.sort_values(['price', 'income'])

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,mode,price,catch,pbeach,ppier,pboat,pcharter,cbeach,cpier,cboat,ccharter,income,log_income,pdiff
1105,1105,1106,pier,1.290,0.4522,1.290,1.290,39.990,74.990,0.5333,0.4522,0.0051,1.0266,416.66668,6.032287,0.000
1157,1157,1158,pier,1.290,0.4522,1.290,1.290,39.990,74.990,0.5333,0.4522,0.1665,0.3975,416.66668,6.032287,0.000
1013,1013,1014,beach,1.290,0.5333,1.290,1.290,39.990,74.990,0.5333,0.4522,0.2413,1.0266,416.66668,6.032287,0.000
275,275,276,pier,2.290,0.0789,2.290,2.290,2.290,31.290,0.0678,0.0789,0.0971,0.1648,1249.99980,7.130899,0.000
1155,1155,1156,pier,2.290,0.4522,2.290,2.290,70.990,105.990,0.5333,0.4522,0.2413,1.0266,1249.99980,7.130899,0.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
377,377,378,boat,328.432,0.0023,511.596,511.596,328.432,352.432,0.1049,0.0451,0.0023,0.0046,12499.99800,9.433484,183.164
588,588,589,charter,330.072,0.0052,106.112,106.112,305.072,330.072,0.2537,0.1498,0.0531,0.0052,6250.00130,8.740337,223.960
446,446,447,charter,335.314,1.0266,578.048,578.048,300.314,335.314,0.5333,0.4522,0.2413,1.0266,8750.00100,9.076809,242.734
211,211,212,charter,387.208,2.3014,115.248,115.248,362.208,387.208,0.2537,0.1498,0.6817,2.3014,7916.66630,8.976725,271.960


Как видно, бедные в основном выбирают рыбалку с берега (pier и beach), богатые - с использованием плавсредств (boat и charter).

Проверка пропущенных значений в датафрейме:

In [None]:
dat.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1182 entries, 524 to 1013
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    1182 non-null   int64  
 1   Unnamed: 0.1  1182 non-null   int64  
 2   mode          1182 non-null   object 
 3   price         1182 non-null   float64
 4   catch         1182 non-null   float64
 5   pbeach        1182 non-null   float64
 6   ppier         1182 non-null   float64
 7   pboat         1182 non-null   float64
 8   pcharter      1182 non-null   float64
 9   cbeach        1182 non-null   float64
 10  cpier         1182 non-null   float64
 11  cboat         1182 non-null   float64
 12  ccharter      1182 non-null   float64
 13  income        1182 non-null   float64
 14  log_income    1182 non-null   float64
 15  pdiff         1182 non-null   float64
dtypes: float64(13), int64(2), object(1)
memory usage: 157.0+ KB


In [None]:
dat.isnull().sum()

Unnamed: 0      0
Unnamed: 0.1    0
mode            0
price           0
catch           0
pbeach          0
ppier           0
pboat           0
pcharter        0
cbeach          0
cpier           0
cboat           0
ccharter        0
income          0
log_income      0
pdiff           0
dtype: int64

Ненулевых значений нет