[kaggle的pandas教程](https://www.kaggle.com/learn/pandas/course)的练习
# 1.Creating, Reading and Writing
[Creating, Reading and Writing](https://www.kaggle.com/code/residentmario/creating-reading-and-writing/data)
## 1.1.Creating data

In [1]:
import pandas as pd
d1 = pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})
d2 = pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

`Dataframe`相当于由若干个“字典”组成。“字典”的`key`是字符串；`value`是线性表，默认通过整数下标访问

In [2]:
print(d1['Yes'])
print("-----")
print(d1['Yes'][1])

0    50
1    21
Name: Yes, dtype: int64
-----
21


也可以指定下标为非整数

In [3]:
d3 = pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])
print(d3['Bob'])
print("-----")
print(d3['Bob']['Product A'])
print("-----")
#print(v3['Bob'][0]) #会报错
d3

Product A      I liked it.
Product B    It was awful.
Name: Bob, dtype: object
-----
I liked it.
-----


Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


`Series`就是前文提到的“字典”。如果说`DataFrame`是一个`table`，那么`Series`就是一个`list`

In [4]:
s1 = pd.Series([1, 2, 3, 4, 5])
s1

0    1
1    2
2    3
3    4
4    5
dtype: int64

`Series`本质上是`DataFrame`的一列。因此，您可以使用索引参数以与以前相同的方式为系列分配行标签。但是，`Series`没有列名，它只有一个总体名称：

In [5]:
s2 = pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')
print(s2)
print('-----')
print(s2['2015 Sales'])

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64
-----
30


`Series`和`DataFrame`密切相关。将`DataFrame`视为实际上只是一堆“粘在一起”的`Series`是有帮助的
## 1.2.Reading data files

能够手动创建`DataFrame`或`Series`非常方便。但是，大多数时候，我们实际上不会手动创建自己的数据。相反，我们将使用已经存在的数据。  
数据可以以多种不同的形式和格式存储。到目前为止，其中最基本的是不起眼的CSV文件。当你打开一个CSV文件时，你会得到这样的东西：
```txt
Product A,Product B,Product C,
30,21,9,
35,34,1,
41,11,11
```
因此，`CSV`文件是一个用逗号分隔的值表。因此得名：`“Comma-Separated Values(逗号分隔值)”`，或`CSV`。  
现在让我们抛开我们的玩具数据集，看看当我们将其读入`DataFrame`时，真实的数据集是什么样子的。我们将使用`pd.read_csv()`函数将数据读取到`DataFrame`中。事情是这样的：

In [6]:
wine_reviews = pd.read_csv("./winemag-data-130k-v2.csv")

我们可以使用`shape`属性来检查生成的`DataFrame`有多大：

In [7]:
wine_reviews.shape

(129971, 14)

我们可以使用`head()`命令检查结果`DataFrame`的内容，该命令抓取前五行：

In [8]:
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


`pd.read_csv()`函数功能丰富，您可以指定`30`多个可选参数。例如，您可以在这个数据集中看到`CSV`文件有一个内置索引，`pandas`不会自动获取。为了使`pandas`使用该列作为索引（而不是从头开始创建一个新列），我们可以指定一个`index_col`。

In [9]:
wine_reviews = pd.read_csv("./winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


# 2.Indexing, Selecting & Assigning
[Indexing, Selecting & Assigning](https://www.kaggle.com/code/residentmario/indexing-selecting-assigning)

## 2.1.Native accessors
原生`Python`对象提供了索引数据的好方法。`pandas`携带了所有这些，这有助于轻松开始。  
考虑这个`DataFrame`:

In [10]:
reviews = wine_reviews
reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


在Python中，我们可以通过将对象作为属性访问来访问它的属性。例如，一个`book`对象可能有一个`title`属性，我们可以通过调用`book.title`来访问它。pandas DataFrame中的列的工作方式大致相同。  
因此，要访问`reviews`的`country`属性，我们可以使用：

In [11]:
reviews.country

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

如果我们有一个Python字典，我们可以使用索引`([])`运算符访问它的值。我们可以对`DataFrame`中的列执行相同的操作：

In [12]:
reviews['country']

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

这是从`DataFrame`中选择特定`Series`的两种方法。它们在语法上都不比另一个有效，但索引运算符`[]`确实有一个优点，即它可以处理包含空格字符的列名（例如，如果我们有一个`country providence`列，`reviews.country providene`将不起作用）。  
`pandas Series`看起来不是有点像一本花哨的词典吗？它几乎是这样的，所以要深入到一个特定的值，我们只需要再次使用索引运算符`[]`就不足为奇了：

In [13]:
reviews['country'][0]

'Italy'

## 2.2.Indexing in pandas
索引运算符和属性选择很好，因为它们的工作方式与Python生态系统的其他部分一样。作为新手，这使得它们易于上手和使用。然而，`pandas`有自己的访问运算符`loc`和`iloc`。对于更高级的操作，这些是你应该使用的。  
**基于指数的选择**  
`pandas`索引的工作方式有两种。第一种是**基于索引的选择**：根据数据中的数字位置选择数据。`iloc`遵循这一范式。
要选择`DataFrame`中的第一行数据，我们可以使用以下方法：

In [14]:
reviews.iloc[0]

country                                                              Italy
description              Aromas include tropical fruit, broom, brimston...
designation                                                   Vulkà Bianco
points                                                                  87
price                                                                  NaN
province                                                 Sicily & Sardinia
region_1                                                              Etna
region_2                                                               NaN
taster_name                                                  Kerin O’Keefe
taster_twitter_handle                                         @kerinokeefe
title                                    Nicosia 2013 Vulkà Bianco  (Etna)
variety                                                        White Blend
winery                                                             Nicosia
Name: 0, dtype: object

`loc`和`iloc`都是先行，后列。这与我们在原生Python中所做的相反，即先列，后行。  
> 注，这里特指在DataFrame的索引中。并且这里有个很自然的问题：`loc`和`iloc`的区别是什么？


这意味着检索行稍微容易一些，检索列稍微困难一些。要使用`iloc`获取列，我们可以执行以下操作：

In [15]:
reviews.iloc[:, 0]

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

`:`运算符本身也来自原生python，意思是“一切”。然而，当与其他选择器结合使用时，它可以用来指示一系列值。例如，要仅从第一、第二和第三行中选择`cuonter`列，我们可以这样做：

In [16]:
reviews.iloc[:3, 0]

0       Italy
1    Portugal
2          US
Name: country, dtype: object

或者，只选择第二个和第三个条目，我们可以这样做：

In [17]:
reviews.iloc[1:3, 0]

1    Portugal
2          US
Name: country, dtype: object

也可以传递一个列表：

In [18]:
reviews.iloc[[0, 1, 2], 0]

0       Italy
1    Portugal
2          US
Name: country, dtype: object

最后，值得知道的是，负数可以用于选择。这将从值的末尾开始向前计数。例如，这是数据集的最后五个元素。

In [19]:
reviews.iloc[-5:]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


In [20]:
reviews.iloc[:-5] #同理猜测 reviews.iloc[:-5] 应该是去除数据集的最后五个元素

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS
129962,Italy,"Blackberry, cassis, grilled herb and toasted a...",Sàgana Tenuta San Giacomo,90,40.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Cusumano 2012 Sàgana Tenuta San Giacomo Nero d...,Nero d'Avola,Cusumano
129963,Israel,"A bouquet of black cherry, tart cranberry and ...",Oak Aged,90,20.0,Galilee,,,Mike DeSimone,@worldwineguys,Dalton 2012 Oak Aged Cabernet Sauvignon (Galilee),Cabernet Sauvignon,Dalton
129964,France,"Initially quite muted, this wine slowly develo...",Domaine Saint-Rémy Herrenweg,90,,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Ehrhart 2013 Domaine Saint-Rémy Herren...,Gewürztraminer,Domaine Ehrhart


## 2.3.Manipulating the index

## 2.4.Conditional selection

## 2.5.Assigning data