# Creating, Loading, and Selecting Data with Pandas
## Importing the Pandas Module
The `pandas` module is usually imported at the top of a Python file under the alias as `pd`
If we need to access the `pandas` module, we can do so by operating on `pd`

In [3]:
import pandas as pd

## Create a DataFrame 1

A DataFrame is an object that stores data as rows and columns. You can think of a DataFrame as a spreadsheet or as a SQL table. You can manually create a DataFrame or fill it with data from a CSV, an Excel spreadsheet, or a SQL query.

DataFrames have rows and columns. Each column has a name, which is a string. Each row has an index, which is an integer. DataFrames can contain many different data types: strings, ints, floats, tuples, etc.

You can pass in a dictionary to `pd.DataFrame()`. Each key is a column name and each value is a list of column values. The columns must all be the same length or you will get an error. Here’s an example:

In [4]:
df1 = pd.DataFrame({
    'name' : ['John Smith', 'Jane Doe', 'Joe Schmo'],
    'address' : ['123 Main St.', '456 Maple Ave.', '789 Broadway'],
    'age' : [34, 28, 51]
})

print(df1)

         name         address  age
0  John Smith    123 Main St.   34
1    Jane Doe  456 Maple Ave.   28
2   Joe Schmo    789 Broadway   51


Note that the columns will appear in alphabetical order because dictionaries don't have any inherent order for columns

You run an online clothes store called Panda's Wardrobe. You need a DataFrame containing information about your products. 

Create a DataFrame with the following data that your inventory manager sent you

In [5]:
df1 = pd.DataFrame({
    'Product ID' : [1, 2, 3, 4],
    'Product Name' : ['t-shirt', 't-shirt', 'skirt', 'skirt'], 
    'Color' : ['blue', 'green', 'red', 'black']
})

print(df1)

   Product ID Product Name  Color
0           1      t-shirt   blue
1           2      t-shirt  green
2           3        skirt    red
3           4        skirt  black


## Create a DataFrame 2

You can also add data using lists.

For example, you can pass in a list of lists, where each one represents a row of data. Use the keyword argument `columns` to pass a list of column names.

In [6]:
df2 = pd.DataFrame([
    ['John Smith', '123 Main St.', 34],
    ['Jane Doe', '456 Maple Ave.', 28],
    ['Joe Schmo', '789 Broadway', 51],
    ],
    columns = ['name', 'address', 'age']
)

print(df2)

         name         address  age
0  John Smith    123 Main St.   34
1    Jane Doe  456 Maple Ave.   28
2   Joe Schmo    789 Broadway   51


You're running a chain of pita shops called Pita Power. You want to create a DataFrame with information on your different store locations. 

Use a list of lists to create a DataFrame

In [7]:
df2 = pd.DataFrame([
    [1, 'San Diego', 100],
    [2, 'Los Angeles', 120],
    [3, 'San Francisco', 90],
    [4, 'Sacramento', 115],
    ],
    columns = ['Store ID', 'Location', 'Number of Employees']
)

print(df2)

   Store ID       Location  Number of Employees
0         1      San Diego                  100
1         2    Los Angeles                  120
2         3  San Francisco                   90
3         4     Sacramento                  115


## Comma Separated Variables (CSV)

Most of the time, we'll be working with datasets that already exists. One of the most common formats for big datasets is the *CSV*. 

*CSV (comma separated values)* is a text-only spreadsheet format. You Can find CSVs in lots of places:
- Online datasets  (here’s an example from (https://catalog.data.gov/dataset?res_format=CSV)[data.gov])
- Export from Excel or Google Sheets
- Export from SQL

The first row of a CSV contains column headings. All subsequent rows contain values. Each column heading and each variable is separated by a comma:

column1,column2,column3

value1,value2,value3

That example CSV represents the following table:

             column1      column2     column3                          
             value1       value2      value3

You run a cupcake store and want to create a record of all of the cupcakes that you offer. Write data as a CSV in `cupcakes.csv`

In [8]:
# name,cake_flavor,frosting_flavor,topping
# Devil's Food,chocolate,chocolate,chocolate shavings
# Birthday Cake,vanilla,vanilla,rainbow sprinkles
# Carrot cake,carrot,cream cheese,almonds

## Loading and Saving CSVs
When you have data in a CSV, you can load it into a DataFrame in Panadas using `.read_csv()`:

In [9]:
# the following command will look for a CSV file called my-csv-file.csv in the current directory 
# which it doesn't as there is no such file, so we create a sample file or else it gives an error FileNotFoundError

df = pd.read_csv('my-csv-file.csv')

In the example above, the `.read_csv()` method is called. The CSV file called `my-csv-file` is passed in as an argument.

We can also save data to a CSV, using `.to_csv()`

In [10]:
df.to_csv('new-csv-file.csv')

You’re working for the County of Whoville and you just received a CSV of data about the different cities in your county. Read the CSV `sample.csv` into a variable called `df`, so that you can learn more about the cities.

In [11]:
df = pd.read_csv('sample.csv')
print(df)

            City  Population  Median Age
0      Maplewood      100000          40
1          Wayne      350000          33
2  Forrest Hills      300000          35
3        Paramus      400000          55
4     Hackensack      290000          39


## Inspect a DataFrame

To know what our DataFrame from a CSV looks like, we use the `print(df)` statement. This works best for a small DataFrame. If it's a larger DataFrame, it's helpful to be able to instpect a few items without having to look at the entire DataFrame.

The method `.head()` gives the first 5 rows of a DataFrame. 

If you want to see more rows, you can pass in the positional argument `n`. 
For example, `df.head(10)` would show the first 10 rows.

The method **`df.info()`** gives some statistics for each column.

You’re working for a Hollywood studio, trying to use data to predict the next big hit. Load the CSV `imdb.csv` into a variable called `df`, so that you can learn about popular movies from the past 90 years.

In [12]:
df = pd.read_csv('imdb.csv')

print(df.head())

print(df.info())

   id                                       name   genre  year  imdb_rating
0   1                                     Avatar  action  2009          7.9
1   2                             Jurassic World  action  2015          7.3
2   3                               The Avengers  action  2012          8.1
3   4                            The Dark Knight  action  2008          9.0
4   5  Star Wars: Episode I - The Phantom Menace  action  1999          6.6
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 220 entries, 0 to 219
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   id           220 non-null    int64  
 1   name         220 non-null    object 
 2   genre        220 non-null    object 
 3   year         220 non-null    int64  
 4   imdb_rating  220 non-null    float64
dtypes: float64(1), int64(2), object(2)
memory usage: 8.7+ KB
None


## Select Columns
Suppose you have the DataFrame called `customers`, which contains the ages of your customers: 

![image.png](attachment:image.png)

Say if you want to take the average or plot a histogram of the ages. In order to do either of these tasks, you'd need to select the column. To do this, there are two possible syntaxes for selecting all values from a column:
1. Select the column as if you were selecting a value from a dictionary using a key. In our example, we would type `customers['age']` to select the ages.
2. If the name of a column follows all of the rules for a variable name (doesn’t start with a number, doesn’t contain spaces or special characters, etc.), then you can select it using the following notation: `df.MySecondColumn`. In our example, we would type `customers.age`.

When we select a single column, the result is called a *Series*

In [13]:
df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west']
)

print(df)

      month  clinic_east  clinic_north  clinic_south  clinic_west
0   January          100           100            23          100
1  February           51            45           145           45
2     March           81            96            65           96
3     April           80            80            54          180
4       May           51            54            54          154
5      June          112           109            79          129


The DataFrame `df` represents data collected by four health clinics run by the same organization. Each row represents a month from January through June and shows the number of appointments made at four different clinics.

You want to analyze what’s been happening at the North location. Create a variable called `clinic_north` that contains ONLY the data from the column `clinic_north`.


In [14]:
clinic_north = df.clinic_north
print(clinic_north)


0    100
1     45
2     96
3     80
4     54
5    109
Name: clinic_north, dtype: int64


In [15]:
print(type(clinic_north)) # this as we can see is a Series - single columns
print(type(df))

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>


## Selecting Multiple Columns

When we have larger DataFrame, we might want to select a few columns.

To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame, we would use:

    `new_df = orders[['column_2', 'column_3']]`
    
**Note: *Make sure that you have a double set of brackets ( [ [ ] ] ), or this command won’t work!**

Let's compare the visits of the Northern and Southern clinics

In [16]:
clinic_north_south = df[['clinic_north', 'clinic_south']]
print(clinic_north_south)
print(type(clinic_north_south)) # result is a frame - DataFrame as there are multiple columns 

   clinic_north  clinic_south
0           100            23
1            45           145
2            96            65
3            80            54
4            54            54
5           109            79
<class 'pandas.core.frame.DataFrame'>


## Select Rows
Let’s revisit our `orders` from ShoeFly.com:
![image.png](attachment:image.png)
Maybe our Customer Service department has just received a message from Joyce Waller, so we want to know exactly what she ordered. We want to select this single row of data.

DataFrames are **zero-indexed**, meaning that we start with the 0th row and count up from there. Joyce Waller’s order is the 2nd row.

We select it using the following command: `orders.iloc[2]`

When we select a single row, the result is a *Series* (just like when we select a single column).

You want to know how many visits took place in March last year, to help you prepare.
We write a up a command that will produce a Series made up of the March data from `df` from all four clinic sites and save it to the variable `march`

In [17]:
march = df.iloc[2]
print(march)

month           March
clinic_east        81
clinic_north       96
clinic_south       65
clinic_west        96
Name: 2, dtype: object


## Selecting Multiple Rows

We can also select multiple rows from a DataFrame. There are different ways of selecting multiple rows:
- `orders.iloc[3:7]` would select all rows starting at the 3rd row and up to but *not including* the 7th row (i.e., the 3rd row, 4th row, 5th row, and 6th row) 
- `orders.iloc[:4]` would select all rows up to, but *not including* the 4th row (i.e., the 0th, 1st, 2nd, and 3rd rows) 
- `orders.iloc[-3:]` would select the rows starting at the 3rd to last row and up to and *including* the final row 

One of your doctors thinks that there are more clinic visits in the late Spring.

Write a command that will produce a DataFrame made up of the data for April, May, and June from `df` for all four sites (rows 3 through 6), and save it to `april_may_june`.


In [18]:
april_may_june = df.iloc[-3:]
print(april_may_june)

   month  clinic_east  clinic_north  clinic_south  clinic_west
3  April           80            80            54          180
4    May           51            54            54          154
5   June          112           109            79          129


## Selecting Rows with Logic 1

You can select a subset of a DataFrame by using logical statements: `df[df.MyColumnName == desired_column_value]`

We have a large DataFrame with information about our customers. A few of the many rows look like this:
![image.png](attachment:image.png)
Suppose we want to select all rows where the customer's age is 30, we would use: `df[df.age == 30]`

We can use other logical statements, such as:
- Greater Than, > : `df[df.age > 30]`
- Less Than, < : `df[df.age < 30]`
- Not Equal, != : `df[df.name != 'Clara Oswald']`


You’re going to staff the clinic for January of this year. You want to know how many visits took place in January of last year, to help you prepare.

Create variable `january` using a logical statement that selects the row of df where the `'month'` column is `'January'`.


In [19]:
january = df[df.month == 'January']
print(january)

     month  clinic_east  clinic_north  clinic_south  clinic_west
0  January          100           100            23          100


## Selecting Rows with Logic 2
We can also combine multiple logical statements, as long as each statement is in parentheses.

For instance, suppose we wanted to select all rows where the customer’s age was under 30 *or* the customer’s name was “Martha Jones”:
![image.png](attachment:image.png)

We could use the following code: 

`df[(df.age < 30) |
   (df.name == 'Martha Jones')]`


You want to see how the number of clinic visits changed between March and April.

Create the variable `march_april`, which contains the data from March and April.

In [20]:
march_april = df[(df.month == 'March') | (df.month == 'April')]
print(march_april)

   month  clinic_east  clinic_north  clinic_south  clinic_west
2  March           81            96            65           96
3  April           80            80            54          180


## Selecting Rows with Logic 3
Suppose we want to select the rows where the customer’s name is either “Martha Jones”, “Rose Tyler” or “Amy Pond”.![image.png](attachment:image.png)
We could use the `isin` command to check that `df.name` is one of a list of values:

`df[df.name.isin(['Martha Jones',
     'Rose Tyler',
     'Amy Pond'])]`
     
Another doctor thinks that you have a lot of clinic visits in the late Winter.

Create the variable `january_february_march`, containing the data from January, February, and March. 

In [21]:
january_february_march = df[df.month.isin(['January', 'February', 'March'])]
print(january_february_march)

      month  clinic_east  clinic_north  clinic_south  clinic_west
0   January          100           100            23          100
1  February           51            45           145           45
2     March           81            96            65           96


## Setting Indices

When we select a subset of a DataFrame using logic, we end up with non-consecutive indices. This is inelegant and makes it hard to use `.iloc()`. 

We can fix this using the method `.reset_index()`. For example, here is a DataFrame called `df` with non-consective indices: 
![image.png](attachment:image.png)

If we use the command `df.reset_index()`, we get a new DataFrame with a new set of indices: 
![image.png](attachment:image.png)
**Note:** the old indices have been moved into a new column called `'index'`. Unless you need those values for something special, it's probably better to use the keyword `drop=True` so that you don't end up with that extra column. If we run the command `df.reset_index(drop=True)`, we get a new DataFrame that looks like this:

![image.png](attachment:image.png)
Usng `.reset_index()` will return a new DataFrame, but we usually just want to modify our existing DataFrame. If we use the keyword `inplace=True` we can just modify our existing DataFrame.

Lets look at the clinic data we have along with a subset `df2` of rows from `df` :

In [22]:
df2 = df.loc[[1, 3, 5]]
print(df2)

      month  clinic_east  clinic_north  clinic_south  clinic_west
1  February           51            45           145           45
3     April           80            80            54          180
5      June          112           109            79          129


Create a new DataFrame called df3 by resetting the indices on df2 (don’t use inplace or drop). Did df2 change after you ran this command?

In [23]:
df3 = df2.reset_index()
print(df3)

   index     month  clinic_east  clinic_north  clinic_south  clinic_west
0      1  February           51            45           145           45
1      3     April           80            80            54          180
2      5      June          112           109            79          129


In [24]:
print(df2)

      month  clinic_east  clinic_north  clinic_south  clinic_west
1  February           51            45           145           45
3     April           80            80            54          180
5      June          112           109            79          129


In [25]:
# as we can see above df2 didn't change
# no we reset the indices of df2 by using the keyword inplace=True and drop=True

df3 = df2.reset_index(drop=True)
print(df3)
print(df2)

      month  clinic_east  clinic_north  clinic_south  clinic_west
0  February           51            45           145           45
1     April           80            80            54          180
2      June          112           109            79          129
      month  clinic_east  clinic_north  clinic_south  clinic_west
1  February           51            45           145           45
3     April           80            80            54          180
5      June          112           109            79          129


In [26]:
df3 = df2.reset_index(inplace=True) # we can achieve this step and previous by just passing both arguments
print(df3)                          # df2.reset_index(inplace=True, drop=True)
print(df2)

None
   index     month  clinic_east  clinic_north  clinic_south  clinic_west
0      1  February           51            45           145           45
1      3     April           80            80            54          180
2      5      June          112           109            79          129


In [27]:
# now as we can see above, df3 doesn't exist and df2 has been reset 

------------------
## Shoefly.com Example
We will be the Data Analyst for ShoeFly.com, a fictional online shoe store.

1. First we load the data from `shoefly.csv` into the variable `orders`
2. Inspect the first 5 lines of the data
3. Your marketing department wants to send out an email blast to everyone who ordered shoes! Select all of the email addresses from the column `email` and save them to a variable called `emails`.
4. Frances Palmer claims that her order was wrong. What did Frances Palmer order? Use logic to select that row of orders and save it to the variable frances_palmer.
5. We need some customer reviews for our comfortable shoes. Select all orders for `shoe_type`: `clogs`, `boots`, and `ballet flats` and save them to the variable `comfy_shoes`.


In [28]:
import pandas as pd

orders = pd.read_csv('shoefly.csv')
print(orders.head())
emails = orders.email
print(emails)

      id first_name last_name                         email     shoe_type  \
0  54791    Rebecca   Lindsay  RebeccaLindsay57@hotmail.com         clogs   
1  53450      Emily     Joyce        EmilyJoyce25@gmail.com  ballet flats   
2  91987      Joyce    Waller        Joyce.Waller@gmail.com       sandals   
3  14437     Justin  Erickson   Justin.Erickson@outlook.com         clogs   
4  79357     Andrew     Banks              AB4318@gmail.com         boots   

  shoe_material shoe_color  
0  faux-leather      black  
1  faux-leather       navy  
2        fabric      black  
3  faux-leather        red  
4       leather      brown  
0     RebeccaLindsay57@hotmail.com
1           EmilyJoyce25@gmail.com
2           Joyce.Waller@gmail.com
3      Justin.Erickson@outlook.com
4                 AB4318@gmail.com
5           JulieMarsh59@gmail.com
6                 TJ5470@gmail.com
7           Janice.Hicks@gmail.com
8        GabrielPorter24@gmail.com
9        FrancesPalmer50@gmail.com
10         Je

In [29]:
frances_palmer = orders[(orders.first_name == 'Frances') & (orders.last_name == 'Palmer')]
print(frances_palmer)

      id first_name last_name                      email shoe_type  \
9  62083    Frances    Palmer  FrancesPalmer50@gmail.com    wedges   

  shoe_material shoe_color  
9       leather      white  


In [30]:
comfy_shoes = orders[orders.shoe_type.isin(['clogs', 'boots', 'ballet flats'])]
print(comfy_shoes)

       id first_name   last_name                         email     shoe_type  \
0   54791    Rebecca     Lindsay  RebeccaLindsay57@hotmail.com         clogs   
1   53450      Emily       Joyce        EmilyJoyce25@gmail.com  ballet flats   
3   14437     Justin    Erickson   Justin.Erickson@outlook.com         clogs   
4   79357     Andrew       Banks              AB4318@gmail.com         boots   
6   20487     Thomas      Jensen              TJ5470@gmail.com         clogs   
7   76971     Janice       Hicks        Janice.Hicks@gmail.com         clogs   
8   21586    Gabriel      Porter     GabrielPorter24@gmail.com         clogs   
10  91629    Jessica        Hale       JessicaHale25@gmail.com         clogs   
12  45832      Susan      Dennis       SusanDennis58@gmail.com  ballet flats   
14  73431    Rebecca     Charles     Rebecca.Charles@gmail.com         boots   
16  39888    Vincent  Stephenson            VS4753@outlook.com         boots   
17  35961        Roy     Tillman        

----------
# Modifying Dataframes
## Adding a Column 1

One way to add a new column is by giving a list of the same length as the existing DataFrame.

Suppose we own a hardware store called The Handy Woman and have a DataFrame containing inventory information:
![image.png](attachment:image.png)
We can add a column, Quantity for each of the product in our warehouse using the following: `df['Quantity'] = [100, 150, 50, 35]`

Our new DataFrame would look like:
![image.png](attachment:image.png)

Example: The DataFrame `df` contains information on products sold at a hardware store. Add a column to `df` called `Sold in Bulk?` which indicates if the product is sold in bulk or individually.

In [40]:
df = pd.DataFrame(
[
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
)

df['Sold in Bulk?'] = ['Yes', 'Yes', 'No', 'No']

print(df)

   Product ID   Description  Cost to Manufacture  Price Sold in Bulk?
0           1  3 inch screw                  0.5   0.75           Yes
1           2   2 inch nail                  0.1   0.25           Yes
2           3        hammer                  3.0   5.50            No
3           4   screwdriver                  2.5   3.00            No


## Adding a Column 2
We can also add a new column that is the same for all rows in the DataFrame. 

We will add a column to `df` called `Is taxed?`, which indicates whether or not to collect sales tax on the product. It should be `Yes` for all rows.

In [38]:
df['Is taxed?'] = 'Yes'
print(df)

   Product ID   Description  Cost to Manufacture  Price Sold in Bulk?  \
0           1  3 inch screw                  0.5   0.75           Yes   
1           2   2 inch nail                  0.1   0.25           Yes   
2           3        hammer                  3.0   5.50            No   
3           4   screwdriver                  2.5   3.00            No   

  Is taxed?  
0       Yes  
1       Yes  
2       Yes  
3       Yes  


## Adding a Column 3
You can also add a new column by performing an operation on the existing columns.

Say if we want to add a column to our inventory table with the amount of sales tax that we need to charge for each item. The following code multiplies each `Price` by `0.075`, the sales tax for our state: `df['Sales Tax'] = df.Price * 0.075`
Now our table has a column called `Sales Tax`
![image.png](attachment:image.png)

Add a column to `df` called `Revenue`, which is equal to the difference between the `Price` and the `Cost to Manufacture`.

In [44]:
df['Revenue'] = df['Price'] - df['Cost to Manufacture']
print(df)

   Product ID   Description  Cost to Manufacture  Price Sold in Bulk?  Revenue
0           1  3 inch screw                  0.5   0.75           Yes     0.25
1           2   2 inch nail                  0.1   0.25           Yes     0.15
2           3        hammer                  3.0   5.50            No     2.50
3           4   screwdriver                  2.5   3.00            No     0.50


## Performing Column Operations
We can use the `apply` function to apply a function to every value in a particular column.

For example, the code below overwrites the existing `Name` columns by applying the function `upper` to every row in `Name`.

`from string import upper`

`df['Name'] = df['Name'].apply(upper)`

![image.png](attachment:image.png)

Apply the function `lower` to all names in column `Name` in `df`. Assign these new names to a new column of `df` called `Lowercase Name`. 

#### Note: This is an old method from Python2 and doesn't work anymore. 
#### No need to: `from string import lower/upper`

In [52]:
import pandas as pd

df = pd.DataFrame(
    [
        ['John SMITH', 'john.smith@gmail.com'],
        ['Jane Doe', 'jdoe@yahoo.com'],
        ['joe schmo', 'joeschmo@hotmail.com']
    
    ],
    columns = ['Name', 'Email']
)

df['Lowercase Name'] = df.Name.apply(str.lower)
print(df)

         Name                 Email Lowercase Name
0  John SMITH  john.smith@gmail.com     john smith
1    Jane Doe        jdoe@yahoo.com       jane doe
2   joe schmo  joeschmo@hotmail.com      joe schmo


## Renaming Columns

To change all the columns names at once by setting the `.columns` property to a different list. This is great when you need to change all of them at once.

Example to correctly rename all columns at once:

`df = pd.DataFrame({
    'name': ['John', 'Jane', 'Sue', 'Fred],
    'age': [23, 29, 21, 18]
})`

`df.columns = ['First Name', 'Age']`

The DataFrame `df` contains data about movies from IMDb.

We want to present this data to some film producers. Right now, our column names are in lower case, and are not very descriptive. Modify `df` using the `.columns` attribute to make the changes to the columns

In [56]:
df = pd.read_csv('imdb.csv')

df.columns = ['ID', 'Title', 'Category', 'Year Released', 'Rating']
print(df.head(10))

   ID                                      Title Category  Year Released  \
0   1                                     Avatar   action           2009   
1   2                             Jurassic World   action           2015   
2   3                               The Avengers   action           2012   
3   4                            The Dark Knight   action           2008   
4   5  Star Wars: Episode I - The Phantom Menace   action           1999   
5   6                                  Star Wars   action           1977   
6   7                    Avengers: Age of Ultron   action           2015   
7   8                      The Dark Knight Rises   action           2012   
8   9  Pirates of the Caribbean: Dead Mans Chest   action           2006   
9  10                                 Iron Man 3   action           2013   

   Rating  
0     7.9  
1     7.3  
2     8.1  
3     9.0  
4     6.6  
5     8.7  
6     7.9  
7     8.5  
8     7.3  
9     7.3  


## Renaming Columns 2
You can also rename individual columns by using the `.rename()` method. Pass a dictionary like the one below to the `columns` keyword argument:
![image.png](attachment:image.png)

Here's an example:
![image.png](attachment:image.png)
The code above will rename `name` to `First Name` and `age` to `Age`.

Using `.rename()` with only the `columns` keyword will create a **new** DataFrame, leaving your origininal DataFrame unchanged. That's why we also passed in the keyword argument **`inplace=True`**. Using `inplace=True` lets us edit the **original** DataFrame.

There are several reasons why `.rename()` is preferable to `.columns`:
- You can rename just one column
- You can be specific about which column names are getting changed (with `.column` you can accidentally switch column names if you're not careful)

**Note:** If you misspell one of the original column names, this command won't fail. It just won't change anything.

In [59]:
df.rename(columns={
    'Title': 'movie_title'  # the column name was replaced to 'Title'
}, inplace=True)            # in the last cell previously just 'name'

print(df.head(7))

   ID                                movie_title Category  Year Released  \
0   1                                     Avatar   action           2009   
1   2                             Jurassic World   action           2015   
2   3                               The Avengers   action           2012   
3   4                            The Dark Knight   action           2008   
4   5  Star Wars: Episode I - The Phantom Menace   action           1999   
5   6                                  Star Wars   action           1977   
6   7                    Avengers: Age of Ultron   action           2015   

   Rating  
0     7.9  
1     7.3  
2     8.1  
3     9.0  
4     6.6  
5     8.7  
6     7.9  
