# Introduction

<div><p>In the last two missions, we explored how the NumPy library makes working with data easier. Because we can easily work across multiple dimensions, our code is a lot easier to understand. By using vectorized operations instead of loops, our code runs faster with larger data.</p>
<p>Although NumPy provides fundamental structures and tools that make working with data easier, there are several things that limit its usefulness:</p>
<ul>
<li>The lack of support for column names forces us to frame questions as multi-dimensional array operations.</li>
<li>Support for only one data type per ndarray makes it more difficult to work with data that contains both numeric and string data.</li>
<li>There are lots of low level methods, but there are many common analysis patterns that don't have pre-built methods.</li>
</ul>
<p>The <strong>pandas</strong> library provides solutions to all of these pain points and more. Pandas is not so much a replacement for NumPy as an <em>extension</em> of NumPy. The underlying code for pandas uses the NumPy library extensively, which means the concepts you've been learning will come in handy as you begin to learn more about pandas.</p>
<p>The primary data structure in pandas is called a <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame" target="_blank"><strong>dataframe</strong></a>. Dataframes are the pandas equivalent of a Numpy 2D ndarray, with a few key differences:</p>
<ul>
<li>Axis values can have string <strong>labels</strong>, not just numeric ones.</li>
<li>Dataframes can contain columns with <strong>multiple data types</strong>: including integer, float, and string.</li>
</ul>
<p></p><center><img src="https://s3.amazonaws.com/dq-content/291/df_anatomy_static_resized.svg" alt="anatomy of a dataframe"></center><p></p></div>

# Introduction to the data

<div><p>As we learn pandas, we'll work with a data set from <a href="http://fortune.com/" target="_blank">Fortune</a> magazine's <a href="https://en.wikipedia.org/wiki/Fortune_Global_500" target="_blank">2017 Global 500 list</a>, which ranks the top 500 corporations worldwide by revenue. The data set was originally compiled <a href="https://data.world/chasewillden/fortune-500-companies-2017" target="_blank">here</a>;  however, we modified the original data set to make it more accessible.</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/fortune-500.jpg" alt="fortune 500 cover"></p>
<p>The data set is a CSV file called <code>f500.csv</code>. Here is a data dictionary for some of the columns in the CSV:</p>
<ul>
<li><code>company</code>: Name of the company.</li>
<li><code>rank</code>: Global 500 rank for the company.</li>
<li><code>revenues</code>: Company's total revenue for the fiscal year, in millions of dollars (USD).</li>
<li><code>revenue_change</code>: Percentage change in revenue between the current and prior fiscal year.</li>
<li><code>profits</code>: Net income for the fiscal year, in millions of dollars (USD).</li>
<li><code>ceo</code>: Company's Chief Executive Officer.</li>
<li><code>industry</code>: Industry in which the company operates.</li>
<li><code>sector</code>: Sector in which the company operates.</li>
<li><code>previous_rank</code>: Global 500 rank for the company for the prior year.</li>
<li><code>country</code>: Country in which the company is headquartered.</li>
</ul>
<p>Similar to the import convention for NumPy (<code>import numpy as np</code>), the import convention for pandas is:</p>
</div>

```
import pandas as pd 
```

<div>
<p>In the <code>script.py</code> code editor for this screen, we have already imported pandas and used the <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html" target="_blank"><code>pandas.read_csv()</code> function</a> to read the CSV into a dataframe and assign it to the variable name <code>f500</code>. We'll learn about <code>read_csv()</code> later in this course, but for now, all you need to know is that it automatically handles reading and parsing most CSV files.</p>
<p>Like NumPy's ndarrays, pandas' dataframes have a <code>.shape</code> attribute which returns a tuple representing the dimensions of each axis of the object. We'll use that and Python's <a href="https://docs.python.org/3.6/library/functions.html#type" target="_blank"><code>type() function</code></a> to inspect the <code>f500</code> dataframe.</p></div>

### Instructions 

<ol>
<li>Use Python's <code>type()</code> function to assign the type of <code>f500</code> to <code>f500_type</code>.</li>
<li>Use the <code>DataFrame.shape</code> attribute to assign the shape of <code>f500</code> to <code>f500_shape</code>.</li>
<li>After you have run your code, use the variable inspector to look at the variables <code>f500</code>, <code>f500_type</code>, and <code>f500_shape</code>.</li>
</ol>

In [2]:
import pandas as pd
f500 = pd.read_csv('f500.csv',index_col=0)
f500.index.name = None

f500_type = type(f500)
f500_shape = f500.shape 

# Introducing Dataframes

<div><p>The code we wrote on the previous screen let us know our data has 500 rows and 16 columns, and is stored as a <code>pandas.core.frame.DataFrame object</code> — or just <strong>dataframe</strong>, the primary pandas data structure.</p>
<p>Recall that one of the features that makes pandas better for working with data is its support for string column and row labels:</p>
<ul>
<li><strong>Axis values can have string labels, not just numeric ones</strong>.</li>
<li>Dataframes can contain columns with multiple data types: including integer, float, and string.</li>
</ul>
<p>Let's verify this next. To view the first few rows of our dataframe, we can use the <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html" target="_blank"><code>DataFrame.head()</code> method</a>. By default, it will return the first five rows of our dataframe. However, it also accepts an optional integer parameter, which specifies the number of rows:</p>
</div>

```

```

<div>
<table class="dataframe">
<thead>
<tr>
<th></th>
<th>rank</th>
<th>revenues</th>
<th>revenue_change</th>
<th>profits</th>
<th>assets</th>
<th>profit_change</th>
<th>ceo</th>
<th>industry</th>
<th>sector</th>
<th>previous_rank</th>
<th>country</th>
<th>hq_location</th>
<th>website</th>
<th>years_on_global_500_list</th>
<th>employees</th>
<th>total_stockholder_equity</th>
</tr>
</thead>
<tbody>
<tr>
<th>Walmart</th>
<td>1</td>
<td>485873</td>
<td>0.8</td>
<td>13643.0</td>
<td>198825</td>
<td>-7.2</td>
<td>C. Douglas McMillon</td>
<td>General Merchandisers</td>
<td>Retailing</td>
<td>1</td>
<td>USA</td>
<td>Bentonville, AR</td>
<td>http://www.walmart.com</td>
<td>23</td>
<td>2300000</td>
<td>77798</td>
</tr>
<tr>
<th>State Grid</th>
<td>2</td>
<td>315199</td>
<td>-4.4</td>
<td>9571.3</td>
<td>489838</td>
<td>-6.2</td>
<td>Kou Wei</td>
<td>Utilities</td>
<td>Energy</td>
<td>2</td>
<td>China</td>
<td>Beijing, China</td>
<td>http://www.sgcc.com.cn</td>
<td>17</td>
<td>926067</td>
<td>209456</td>
</tr>
<tr>
<th>Sinopec Group</th>
<td>3</td>
<td>267518</td>
<td>-9.1</td>
<td>1257.9</td>
<td>310726</td>
<td>-65.0</td>
<td>Wang Yupu</td>
<td>Petroleum Refining</td>
<td>Energy</td>
<td>4</td>
<td>China</td>
<td>Beijing, China</td>
<td>http://www.sinopec.com</td>
<td>19</td>
<td>713288</td>
<td>106523</td>
</tr>
</tbody>
</table>
<p>Likewise, we can use the <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.tail.html" target="_blank"><code>DataFrame.tail()</code> method</a> to show us the last rows of our dataframe:</p>
</div>

```

```

<div>
<table class="dataframe">
<thead>
<tr>
<th></th>
<th>rank</th>
<th>revenues</th>
<th>revenue_change</th>
<th>profits</th>
<th>assets</th>
<th>profit_change</th>
<th>ceo</th>
<th>industry</th>
<th>sector</th>
<th>previous_rank</th>
<th>country</th>
<th>hq_location</th>
<th>website</th>
<th>years_on_global_500_list</th>
<th>employees</th>
<th>total_stockholder_equity</th>
</tr>
</thead>
<tbody>
<tr>
<th>Wm. Morrison Supermarkets</th>
<td>498</td>
<td>21741</td>
<td>-11.3</td>
<td>406.4</td>
<td>11630</td>
<td>20.4</td>
<td>David T. Potts</td>
<td>Food and Drug Stores</td>
<td>Food &amp; Drug Stores</td>
<td>437</td>
<td>Britain</td>
<td>Bradford, Britain</td>
<td>http://www.morrisons.com</td>
<td>13</td>
<td>77210</td>
<td>5111</td>
</tr>
<tr>
<th>TUI</th>
<td>499</td>
<td>21655</td>
<td>-5.5</td>
<td>1151.7</td>
<td>16247</td>
<td>195.5</td>
<td>Friedrich Joussen</td>
<td>Travel Services</td>
<td>Business Services</td>
<td>467</td>
<td>Germany</td>
<td>Hanover, Germany</td>
<td>http://www.tuigroup.com</td>
<td>23</td>
<td>66779</td>
<td>3006</td>
</tr>
<tr>
<th>AutoNation</th>
<td>500</td>
<td>21609</td>
<td>3.6</td>
<td>430.5</td>
<td>10060</td>
<td>-2.7</td>
<td>Michael J. Jackson</td>
<td>Specialty Retailers</td>
<td>Retailing</td>
<td>0</td>
<td>USA</td>
<td>Fort Lauderdale, FL</td>
<td>http://www.autonation.com</td>
<td>12</td>
<td>26000</td>
<td>2310</td>
</tr>
</tbody>
</table>
<p>Let's practice using these methods. </p></div>

### Instructions 

<p>Just like in the previous missions, the <code>f500</code> variable we created on the previous screen is available to you here.</p>
<ol>
<li>Use the <code>head()</code> method to select the <strong>first 6 rows</strong>. Assign the result to <code>f500_head</code>.</li>
<li>Use the <code>tail()</code> method to select the <strong>last 8 rows</strong>. Assign the result to <code>f500_tail</code>.</li>
<li>After you have run your code, use the variable inspector and output to view information about the dataframe.</li>
</ol>

In [3]:
f500_head = f500.head(6)
f500_tail = f500.tail(8)

<div><p>Another feature that makes pandas better for working with data is that dataframes can contain more than one data type:</p>
<ul>
<li>Axis values can have string labels, not just numeric ones</li>
<li><strong>Dataframes can contain columns with multiple data types: including integer, float, and string.</strong></li>
</ul>
<p>We can use the <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dtypes.html#pandas.DataFrame.dtypes" target="_blank"><code>DataFrame.dtypes</code> attribute</a> (similar to NumPy's <a href="http://docs.scipy.org/doc/numpy-1.14.2/reference/generated/numpy.ndarray.dtype.html#numpy.ndarray.dtype" target="_blank"><code>ndarray.dtype</code> attribute</a>) to return information about the types of each column. Let's look at an example using a selection of data stored using the variable name <code>f500_selection</code>.</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/loc_original.svg" alt="f500_selection dataframe"></p>
</div>

```
print(f500_selection.dtypes)
```
```
rank          int64
revenues      int64
profits     float64
country      object
dtype: object
```

<div>
<p>We can see three different data types, or <strong>dtypes</strong>.</p>
<p>You may recognize the <code>float64</code> dtype from our work in NumPy. Pandas uses NumPy dtypes for numeric columns, including <code>integer64</code>. There is also a type we haven't seen before, <code>object</code>, which is used for columns that have data that doesn't fit into any other dtypes. This is almost always used for columns containing string values.</p>
<p>When we import data, pandas will attempt to guess the correct dtype for each column. Generally, pandas does a good job with this, which means we don't need to worry about specifying dtypes every time we start to work with data.</p>
<p>If we wanted an overview of all the dtypes used in our dataframe, along with its shape and other information, we could use the <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.info.html#pandas.DataFrame.info" target="_blank"><code>DataFrame.info()</code> method</a>. Note that <code>DataFrame.info()</code> prints the information, rather than returning it, so we can't assign it to a variable.</p></div>

### Instructions 

<ol>
<li>Use the <code>DataFrame.info()</code> method to display information about the <code>f500</code> dataframe.</li>
<li>After you have run your code, use the variable inspector and output to view information about the dataframe.</li>
</ol>

In [4]:
f500.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   rank                      500 non-null    int64  
 1   revenues                  500 non-null    int64  
 2   revenue_change            498 non-null    float64
 3   profits                   499 non-null    float64
 4   assets                    500 non-null    int64  
 5   profit_change             436 non-null    float64
 6   ceo                       500 non-null    object 
 7   industry                  500 non-null    object 
 8   sector                    500 non-null    object 
 9   previous_rank             500 non-null    int64  
 10  country                   500 non-null    object 
 11  hq_location               500 non-null    object 
 12  website                   500 non-null    object 
 13  years_on_global_500_list  500 non-null    int64  
 14  em

# Selecting a column from a Dataframe by label 

<div><p>In the last exercise, we used the <code>DataFrame.info()</code> method to show us the number of entries in our index (representing the number of rows), a list of each column with their dtype and the number of non-null values, as well as a summary of the different dtypes and memory usage.</p>
<p>Because our axes in pandas have labels, we can select data using those labels — unlike in NumPy, where we needed to know the exact index location. To do this, we can use the <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html#pandas.DataFrame.loc" target="_blank"><code>DataFrame.loc[]</code> attribute</a>. The syntax for <code>DataFrame.loc[]</code> is:</p>
</div>

```
df.loc[row_label, column_label]
```

<div>
<p>Notice that we use brackets (<code>[]</code>) instead of parentheses (<code>()</code>) when selecting by location.</p>
<p>Throughout our pandas missions, you'll see <code>df</code> used in code examples as shorthand for a dataframe object. We use this convention because it's also used extensively in the official pandas documentation — getting used to reading it is important.  </p>
<p>Let's look at an example next. We'll again work with just a selection of data stored as <code>f500_selection</code>:</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/loc_original.svg" alt="f500_selection dataframe"></p>
<p>Let's select a single column by specifying a <strong>single label</strong>:</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/loc_single.svg" alt="loc single column"></p>
<p>Notice we used <code>:</code> to specify that we wish to select all rows. Also note that the new dataframe has the same row labels as the original.</p>
<p>We can also use the following shortcut to select a single column:</p>
</div>

```
rank_col = f500_selection["rank"]
print(rank_col)
```
```
Walmart                     1
State Grid                  2
Sinopec Group               3
China National Petroleum    4
Toyota Motor                5
Name: rank, dtype: int64
```

<div>
<p>This style of selecting columns is very commonly seen. We will use it throughout our Dataquest missions.</p>
<p>Let's practice using this technique to select a specific column from our <code>f500</code> dataframe.</p></div>

### Instructions 

<ol>
<li>Select the <code>industry</code> column. Assign the result to the variable name <code>industries</code>.</li>
<li>Use Python's <code>type()</code> function to assign the type of <code>industries</code> to <code>industries_type</code>.</li>
<li>After you have run your code, use the variable inspector to look at the variables.</li>
</ol>

In [5]:
industries = f500["industry"]
industries_type = type(industries)

# Introduction to `Series`

<div><p>On the last screen, we observed that when you select just one column of a dataframe, you get a new pandas type: a <strong>series object</strong>. Series is the pandas type for one-dimensional objects. Anytime you see a 1D pandas object, it will be a series. Anytime you see a 2D pandas object, it will be a dataframe.</p>
<p>In fact, you can think of a dataframe as a collection of series objects, which is similar to how pandas stores the data behind the scenes.</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/df_exploded_resized.svg" alt="dataframe exploded"></p>
<p>As we continue learning how to select data, pay attention to which objects are dataframes and which objects are series.</p></div>

# Selecting columns from a Dataframe by label (contd..)

<div><p>Next, let's learn how to select multiple columns. As a reminder, here's the selection of data we're working with:</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/loc_original.svg" alt="f500_selection dataframe"></p>
<p>Below, we use a <strong>list of labels</strong> to select specific columns:</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/loc_list_updated.svg" alt="loc list of columns"></p>
<p>Because the object returned is two-dimensional, we know it's a <em>dataframe</em>, not a series. Again, instead of <code>df.loc[:,["col1","col2"]]</code>, you can also use <code>df[["col1", "col2"]]</code> to select specific columns. </p>
<p>Let's finish by using a <strong>a slice object with labels</strong> to select specific columns:</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/loc_slice.svg" alt="loc slice of columns"></p>
<p>We again get a dataframe object, with all of the columns from the first up until — <strong>and including</strong> — the last column in our slice. Also note there is no shortcut for selecting column slices.</p>
<p>A summary of the techniques we've learned so far is below:</p>
<p></p><center>
<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Common Shorthand</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single column</td>
<td><code>df.loc[:,"col1"]</code></td>
<td><code>df["col1"]</code></td>
</tr>
<tr>
<td>List of columns</td>
<td><code>df.loc[:,["col1", "col7"]]</code></td>
<td><code>df[["col1", "col7"]]</code></td>
</tr>
<tr>
<td>Slice of columns</td>
<td><code>df.loc[:,"col1":"col4"]</code></td>
<td></td>
</tr>
</tbody>
</table>
</center><p></p>
<p>Let's practice using these techniques to select specific columns from our <code>f500</code> dataframe.</p></div>


### Instructions 

<ol>
<li>Select the <code>country</code> column. Assign the result to the variable name <code>countries</code>.</li>
<li>In order, select the <code>revenues</code> and <code>years_on_global_500_list</code> columns. Assign the result to the variable name <code>revenues_years</code>.</li>
<li>In order, select all columns from <code>ceo</code> up to and including <code>sector</code>. Assign the result to the variable name <code>ceo_to_sector</code>.</li>
<li>After you have run your code, use the variable inspector to view the variables.</li>
</ol>

In [6]:
countries = f500["country"]
revenues_years = f500[["revenues", "years_on_global_500_list"]]
ceo_to_sector = f500.loc[:, "ceo":"sector"]

# Selecting rows from a Dataframe by label

<div><p>Now that we've learned how to select columns by label, let's learn how to select <em>rows</em> using the labels of the <strong>index</strong> axis:</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/df_anatomy_static_resized.svg" alt="anatomy of a dataframe"></p>
<p>We use the same syntax to select rows from a dataframe as we do for columns:</p>
</div>

```
df.loc[row_label, column_label]
```

<div>
<p>We'll again use a selection of our data, stored as the variable <code>f500_selection</code>:</p>
<p><img src="https://s3.amazonaws.com/dq-content/291/loc_original.svg" alt="f500_selection dataframe"></p>
<p><strong>Select a single row</strong></p>
</div>

```
single_row = f500_selection.loc["Sinopec Group"]
print(type(single_row))
print(single_row)
```
```
class 'pandas.core.series.Series'

rank             3
revenues    267518
profits     1257.9
country      China
Name: Sinopec Group, dtype: object
```

<div>
<p>Note the object returned is a <em>series</em> because it is one-dimensional. Since this series has to store integer, float, and string values, pandas uses the <code>object</code> dtype, since none of the numeric types could cater for all values.</p>
<p><strong>Select a list of rows</strong></p>
</div>

```
list_rows = f500_selection.loc[["Toyota Motor", "Walmart"]]
print(type(list_rows))
print(list_rows)
```
```
class 'pandas.core.frame.DataFrame'

              rank  revenues  profits country
Toyota Motor     5    254694  16899.3   Japan
Walmart          1    485873  13643.0     USA
```

<div>
<p><strong>Select a slice object with labels</strong> </p>
<p>For selection using slices, we can use the shortcut below. This is the reason we can't use this shortcut for columns - because it's reserved for use with rows:</p>
</div>

```
slice_rows = f500_selection["State Grid":"Toyota Motor"]
print(type(slice_rows))
print(slice_rows)
```
```
class 'pandas.core.frame.DataFrame'

                          rank  revenues  profits country
State Grid                   2    315199   9571.3   China
Sinopec Group                3    267518   1257.9   China
China National Petroleum     4    262573   1867.5   China
Toyota Motor                 5    254694  16899.3   Japan
```

### Instructions 

<ul>
<li>By selecting data from <code>f500</code>:<ol>
<li>Create a new variable <code>toyota</code>, with:<ul>
<li>Just the row with index <code>Toyota Motor</code>.</li>
<li>All columns.</li>
</ul>
</li>
<li>Create a new variable, <code>drink_companies</code>, with:<ul>
<li>Rows with indicies <code>Anheuser-Busch InBev</code>, <code>Coca-Cola</code>, and <code>Heineken Holding</code>, in that order.</li>
<li>All columns.</li>
</ul>
</li>
<li>Create a new variable, <code>middle_companies</code> with:<ul>
<li>All rows with indicies from <code>Tata Motors</code>to <code>Nationwide</code>, inclusive.</li>
<li>All columns from <code>rank</code> to <code>country</code>, inclusive.</li>
</ul>
</li>
</ol>
</li>
</ul>

In [7]:
toyota = f500.loc["Toyota Motor"]
drink_companies = f500.loc[["Anheuser-Busch InBev", 
                           "Coca-Cola",
                           "Heineken Holding"]]
middle_companies = f500.loc["Tata Motors":"Nationwide", "rank":"country"]


# Series vs DataFrame

<div><p>On the past couple of screens, we created both series objects and dataframe objects as we selected data from our <code>f500</code> dataframe. Take a minute to review these examples before we continue:</p>
<p></p><center><img src="https://s3.amazonaws.com/dq-content/291/df_series_s_updated.svg" alt="series vs dataframe: series"></center><p></p>
<p></p><center><img src="https://s3.amazonaws.com/dq-content/291/df_series_df_updated.svg" alt="series vs dataframe: dataframe"></center><p></p></div>

# Value Counts method 

<div><p>Because series and dataframes are two distinct objects, they have their own unique methods. Let's look at an example of a series method next - the <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html" target="_blank"><code>Series.value_counts()</code> method</a>. This method displays each unique non-null value in a column and their counts in order.</p>
<p>First, we'll select just one column from the <code>f500</code> dataframe:</p>
</div>

```
sectors = f500["sector"]
print(type(sectors))
```
```
class 'pandas.core.series.Series'
```

<div>
<p>Next, we'll substitute "Series" in <code>Series.value_counts()</code> with the name of our <code>sectors</code> series, like below:</p>
</div>

```
sectors_value_counts = sectors.value_counts()
print(sectors_value_counts)
```
```
Financials                       118
Energy                            80
Technology                        44
Motor Vehicles & Parts            34
Wholesalers                       28
Health Care                       27
Food & Drug Stores                20
Transportation                    19
Telecommunications                18
Retailing                         17
Food, Beverages & Tobacco         16
Materials                         16
Industrials                       15
Aerospace & Defense               14
Engineering & Construction        13
Chemicals                          7
Media                              3
Household Products                 3
Hotels, Restaurants & Leisure      3
Business Services                  3
Apparel                            2
Name: sector, dtype: int64
```

<div>
<p>In the resulting series, we can see each unique non-null value in the column and their counts.</p>
<p>Let's see what happens when we try to use the <code>Series.value_counts()</code> method with a dataframe. First, we'll select the <code>sector</code> and <code>industry</code> columns to create a dataframe named <code>sectors_industries</code>:</p>
</div>

```
sectors_industries = f500[["sector", "industry"]]
print(type(sectors_industries))
```
```
< class 'pandas.core.frame.DataFrame' >
```

<div>
<p>Then, we'll try to use the <code>value_counts()</code> method:</p>
</div>

```
si_value_counts = sectors_industries.value_counts()
print(si_value_counts)
```

<div>
<p>Since <code>value_counts()</code> is a <em>series only</em> method, we get the following error:</p>
</div>

```
AttributeError: 'DataFrame' object has no attribute 'value_counts'
```

### Instructions

<p>We've already saved a selection of data from <code>f500</code> to a dataframe named <code>f500_sel</code>.</p>
<ol>
<li>Find the counts of each unique value in the <code>country</code> column in the <code>f500_sel</code> dataframe.<ul>
<li>Select the <code>country</code> column in the <code>f500_sel</code> dataframe. Assign it to a variable named <code>countries</code>.</li>
<li>Use the <code>Series.value_counts()</code> method to return the value counts for <code>countries</code>. Assign the results to <code>country_counts</code>.</li>
</ul>
</li>
</ol>

In [16]:
f500_sel = f500.loc[f500["rank"] < 100]
print(f500_sel.tail(3))

countries = f500_sel["country"]
country_counts = countries.value_counts()

                     rank  revenues  revenue_change  profits  assets  \
Johnson & Johnson      97     71890             2.6  16540.0  141208   
Procter & Gamble       98     71726            -8.9  10508.0  127136   
U.S. Postal Service    99     71498             3.7  -5591.0   25219   

                     profit_change               ceo  \
Johnson & Johnson              7.3       Alex Gorsky   
Procter & Gamble              49.3   David S. Taylor   
U.S. Postal Service            NaN  Megan J. Brennan   

                                                industry              sector  \
Johnson & Johnson                        Pharmaceuticals         Health Care   
Procter & Gamble         Household and Personal Products  Household Products   
U.S. Postal Service  Mail, Package, and Freight Delivery      Transportation   

                     previous_rank country        hq_location  \
Johnson & Johnson              103     USA  New Brunswick, NJ   
Procter & Gamble                86 

# Selecting item from a Series by label

<div><p>In the last exercise, we practiced using the <code>Series.value_counts()</code> method. Next, let's find the counts of each unique value in the <code>country</code> column for the entire <code>f500</code> dataframe:</p>
</div>

```
countries = f500["country"]
country_counts = countries.value_counts()
```
```
USA             132
China           109
Japan            51
Germany          29
France           29
Britain          24
South Korea      15
Netherlands      14
Switzerland      14
Canada           11
Spain             9
Brazil            7
Australia         7
India             7
Italy             7
Taiwan            6
Russia            4
Ireland           4
Sweden            3
Singapore         3
Mexico            2
Israel            1
Turkey            1
Norway            1
Thailand          1
Belgium           1
Venezuela         1
Luxembourg        1
Denmark           1
Indonesia         1
Malaysia          1
Saudi Arabia      1
Finland           1
U.A.E             1
Name: country, dtype: int64
```

<div>
<p>However, what if we wanted to select just the count for India? Or the counts for just the countries in North America?</p>
<p>As with dataframes, we can use <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.loc.html" target="_blank"><code>Series.loc[]</code></a> to select items from a series using single labels, a list, or a slice object. We can also omit <code>loc[]</code> and use bracket shortcuts for all three:</p>
<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Shorthand Convention</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single item from series</td>
<td><code>s.loc["item8"]</code></td>
<td><code>s["item8"]</code></td>
</tr>
<tr>
<td>List of items from series</td>
<td><code>s.loc[["item1","item7"]]</code></td>
<td><code>s[["item1","item7"]]</code></td>
</tr>
<tr>
<td>Slice of items from series</td>
<td><code>s.loc["item2":"item4"]</code></td>
<td><code>s["item2":"item4"]</code></td>
</tr>
</tbody>
</table>
<p>Let's practice selecting data from pandas series:</p></div>

### Instructions 

<ul>
<li>From the pandas series <code>countries_counts</code>:<ol>
<li>Select the item at index label <code>India</code>. Assign the result to the variable name <code>india</code>.</li>
<li>In order, select the items with index labels <code>USA</code>, <code>Canada</code>, and <code>Mexico</code>. Assign the result to the variable name <code>north_america</code>.</li>
</ol>
</li>
</ul>

In [17]:
countries = f500['country']
countries_counts = countries.value_counts()

india = countries_counts["India"]
north_america = countries_counts[["USA", "Canada", "Mexico"]]

---
# Summary challenge

<div><p>Let's take a look at a summary of all the different label selection mechanisms we've learned in this mission:</p>
<table>
<thead>
<tr>
<th>Select by Label</th>
<th>Explicit Syntax</th>
<th>Shorthand Convention</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single column from dataframe</td>
<td><code>df.loc[:,"col1"]</code></td>
<td><code>df["col1"]</code></td>
</tr>
<tr>
<td>List of columns from dataframe</td>
<td><code>df.loc[:,["col1","col7"]]</code></td>
<td><code>df[["col1","col7"]]</code></td>
</tr>
<tr>
<td>Slice of columns from dataframe</td>
<td><code>df.loc[:,"col1":"col4"]</code></td>
<td></td>
</tr>
<tr>
<td>Single row from dataframe</td>
<td><code>df.loc["row4"]</code></td>
<td></td>
</tr>
<tr>
<td>List of rows from dataframe</td>
<td><code>df.loc[["row1", "row8"]]</code></td>
<td></td>
</tr>
<tr>
<td>Slice of rows from dataframe</td>
<td><code>df.loc["row3":"row5"]</code></td>
<td><code>df["row3":"row5"]</code></td>
</tr>
<tr>
<td>Single item from series</td>
<td><code>s.loc["item8"]</code></td>
<td><code>s["item8"]</code></td>
</tr>
<tr>
<td>List of items from series</td>
<td><code>s.loc[["item1","item7"]]</code></td>
<td><code>s[["item1","item7"]]</code></td>
</tr>
<tr>
<td>Slice of items from series</td>
<td><code>s.loc["item2":"item4"]</code></td>
<td><code>s["item2":"item4"]</code></td>
</tr>
</tbody>
</table>
<p>Next, let's practice what we've learned!</p></div>

### Instructions 

<p>By selecting data from <code>f500</code>:</p>
<ol>
<li>Create a new variable <code>big_movers</code>, with:  <ul>
<li>Rows with indices <code>Aviva</code>, <code>HP</code>, <code>JD.com</code>, and <code>BHP Billiton</code>, in that order.</li>
<li>The <code>rank</code> and <code>previous_rank</code> columns, in that order.     </li>
</ul>
</li>
<li>Create a new variable, <code>bottom_companies</code> with:<ul>
<li>All rows with indices from <code>National Grid</code>to <code>AutoNation</code>, inclusive.</li>
<li>The <code>rank</code>, <code>sector</code>, and <code>country</code> columns.</li>
</ul>
</li>
</ol>

In [18]:
big_movers = f500.loc[["Aviva", "HP", "JD.com", "BHP Billiton"], 
                      ["rank", "previous_rank"]]

bottom_companies = f500.loc["National Grid":"AutoNation",
                           ["rank", "sector", "country"]]