## Importing pandas
Getting started and checking your pandas setup

### 1. Import pandas under the alias pd.

In [8]:
import pandas as pd

### Print the version of pandas that has been imported.

In [9]:
print(pd.__version__)

0.25.3


### Print out all the version information of the libraries that are required by the pandas library.

In [7]:
print(pd.show_versions())


INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.1.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.5.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.3
numpy            : 1.17.3
pytz             : 2018.7
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 39.0.1
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.4.1
html5lib         : None
pymysql          : 0.9.3
psycopg2         : None
jinja2           : 2.10.3
IPython          : 7.8.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.4.1
matplotlib       : 3.1.1
numexpr          :

## DataFrame basics
A few of the fundamental routines for selecting, sorting, adding and aggregating data in DataFrame

``` python  
import numpy as np
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```

### 4. Create a DataFrame df from this dictionary data which has the index labels.

In [17]:
import numpy as np
import pandas as pd

data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
pd.DataFrame(data, index=labels)

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,2.0,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


### 5. Display a summary of the basic information about this DataFrame and its data (hint: there is a single method that can be called on the DataFrame)

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
animal      10 non-null object
age         8 non-null float64
visits      10 non-null int64
priority    10 non-null object
dtypes: float64(1), int64(1), object(2)
memory usage: 720.0+ bytes


### 6. Return the first 3 rows of the DataFrame df.

In [20]:
df.head(3)

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no


### 7. Select just the 'animal' and 'age' columns from the DataFrame df.

In [27]:
print('法一')
print(df[['age', 'animal']])

print('法二')
print(df.loc[:, ['animal', 'age']])

法一
   age animal
a  2.5    cat
b  3.0    cat
c  0.5  snake
d  NaN    dog
e  5.0    dog
f  2.0    cat
g  4.5  snake
h  NaN    cat
i  7.0    dog
j  3.0    dog
法二
  animal  age
a    cat  2.5
b    cat  3.0
c  snake  0.5
d    dog  NaN
e    dog  5.0
f    cat  2.0
g  snake  4.5
h    cat  NaN
i    dog  7.0
j    dog  3.0


### 8. Select the data in rows [3, 4, 8] and in columns ['animal', 'age']

In [40]:
df.loc[df.index[[3,4,8]]][['animal','age']]

Unnamed: 0,animal,age
d,dog,
e,dog,5.0
i,dog,7.0


### 9. Select only the rows where the number of visits is greater than 3.

In [43]:
df[df['visits']>3]

Unnamed: 0,animal,age,visits,priority


### 10. Select the rows where the age is missing, i.e. it is NaN.

In [63]:
print('法一')
print(df[df['age'].isnull()])

print('法二')
print(df[df['age'].isna()])

法一
  animal  age  visits priority
d    dog  NaN       3      yes
h    cat  NaN       1      yes
法二
  animal  age  visits priority
d    dog  NaN       3      yes
h    cat  NaN       1      yes


### 11. Select the rows where the animal is a cat and the age is less than 3.

In [66]:
df[(df['animal']=='cat') & (df['age']<3)]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
f,cat,2.0,3,no


### 12. Select the rows the age is between 2 and 4 (inclusive).

In [71]:
print('法一')
print(df[(df['age']>=2) & (df['age']<=4)])

print('法二')
print(df[df['age'].between(2, 4)])

法一
  animal  age  visits priority
a    cat  2.5       1      yes
b    cat  3.0       3      yes
f    cat  2.0       3       no
j    dog  3.0       1       no
法二
  animal  age  visits priority
a    cat  2.5       1      yes
b    cat  3.0       3      yes
f    cat  2.0       3       no
j    dog  3.0       1       no


In [None]:
Change the age in row 'f' to 1.5

## 参考

- [100-pandas-puzzles](https://github.com/ajcr/100-pandas-puzzles/blob/master/100-pandas-puzzles.ipynb)