## Introduction

In this micro-course, you'll learn all about pandas, the most popular Python library for data analysis.

Along the way, you'll complete several hands-on exercises with real-world data. We recommend that you work on the exercises while reading the corresponding tutorials.

In this tutorial, you will learn how to create your own data, along with how to work with data that already exists.

在本微型课程中，您将了解到最受欢迎的 Python 数据分析库 pandas 的全部内容。

在学习过程中，您将使用真实世界的数据完成多个实践练习。我们建议您在阅读相应教程的同时完成练习。

在本教程中，您将学习如何创建自己的数据，以及如何使用已有的数据。

In [1]:
import pandas as pd

## Creating data

There are two core objects in pandas: the DataFrame and the Series.

pandas 有两个核心对象：DataFrame 和 Series。

### DataFrame
A DataFrame is a table. It contains an array of individual <i>entries</i>, each of which has a certain <i>value</i>. Each entry corresponds to a row (or record) and a column.

For example, consider the following simple DataFrame:

DataFrame 是一个表格。它包含一个由单个条目组成的数组，每个条目都有一定的值。每个条目对应一行（或记录）和一列。

例如，请看下面这个简单的 DataFrame：

In [2]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In this example, the "0, No" entry has the value of 131. The "0, Yes" entry has a value of 50, and so on.

DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:

在本例中，"0，否 "条目的值为 131。0，是 "条目的值为 50，以此类推。

DataFrame 条目不限于整数。例如，下面是一个值为字符串的 DataFrame：

In [3]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


We are using the `pd.DataFrame()` constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names (`Bob` and `Sue` in this example), and whose values are a list of entries. This is the standard way of constructing a new DataFrame, and the one you are most likely to encounter.

The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an `index` parameter in our constructor:

我们使用`pd.DataFrame()`构造函数生成这些 DataFrame 对象。声明新 DataFrame 的语法是一个字典，其键是列名（本例中为`Bob`和`Sue`），其值是条目列表。这是构造新 DataFrame 的标准方法，也是您最有可能遇到的方法。

字典列表构造函数为列标签赋值，但只使用从 0 开始的升序计数（0、1、2、3......）作为行标签。有时这样做没有问题，但很多时候我们希望自己分配这些标签。

DataFrame 中使用的行标签列表称为索引。我们可以在构造函数中使用索引参数为其赋值：

In [4]:
pd.DataFrame(
    {"Bob": ["I liked it.", "It was awful."], "Sue": ["Pretty good.", "Bland."]},
    index=["Product A", "Product B"],
)

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


### Series

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

相比之下，系列是数据值的序列。如果说 DataFrame 是一个表格，那么 Series 就是一个列表。事实上，你只需要一个列表就可以创建一个数据系列：

In [5]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

A Series is, in essence, a single column of a DataFrame. So you can assign row labels to the Series the same way as before, using an `index` parameter. However, a Series does not have a column name, it only has one overall `name`:

<b>Series(系列)实质上是 DataFrame 的单列</b>。因此，您可以像以前一样，使用 `index` 参数为系列指定行标签。但是，Series没有列名，只有一个总的 `name`：

In [6]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together". We'll see more of this in the next section of this tutorial.

系列和 DataFrame 密切相关。将 DataFrame 看作是 "粘合 "在一起的一系列数据，会有所帮助。在本教程的下一部分，我们将看到更多这方面的内容。

## Reading data files
Being able to create a DataFrame or Series by hand is handy. But, most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

手动创建 DataFrame 或 Series 非常方便。但在大多数情况下，我们实际上不会手工创建自己的数据。相反，我们会使用已经存在的数据。

Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. When you open a CSV file you get something that looks like this:

数据可以多种不同的形式和格式存储。其中最基本的就是简陋的 CSV 文件。打开 CSV 文件时，你会看到类似下面这样的内容：
```
Product A,Product B,Product C,
30,21,9,
35,34,1,
41,11,11
```

So a CSV file is a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.

因此，CSV 文件是一个用逗号分隔的数值表。因此被称为 "逗号分隔值 "或 CSV。

Let's now set aside our toy datasets and see what a real dataset looks like when we read it into a DataFrame. We'll use the `pd.read_csv()` function to read the data into a DataFrame. This goes thusly:

现在让我们抛开玩具数据集，看看将真实数据集读入 DataFrame 时是什么样子。我们将使用`pd.read_csv()`函数将数据读入 DataFrame。具体步骤如下

In [7]:
wine_reviews = pd.read_csv("./input/winemag-data-130k-v2.csv")

We can use the `shape` attribute to check how large the resulting DataFrame is:

我们可以使用 `shape` 属性来检查生成的 DataFrame 有多大：

In [8]:
wine_reviews.shape

(129971, 14)

So our new DataFrame has 130,000 records split across 14 different columns. That's almost 2 million entries!

因此，我们的新 DataFrame 有 130,000 条记录，分布在 14 个不同的列中。这差不多是 200 万个条目！

We can examine the contents of the resultant DataFrame using the `head()` command, which grabs the first five rows:

我们可以使用 `head()` 命令检查结果 DataFrame 的内容，该命令会抓取前五行：

In [9]:
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


The `pd.read_csv()` function is well-endowed, with over 30 optional parameters you can specify. For example, you can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an `index_col`.

`pd.read_csv()`函数功能强大，可以指定 30 多个可选参数。例如，在这个数据集中，你可以看到 CSV 文件有一个内置索引，而 pandas 并没有自动识别它。为了让 pandas 使用该列作为索引（而不是从头开始创建一个新索引），我们可以指定一个 `index_col`。


In [10]:
wine_reviews = pd.read_csv("./input/winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
