**Column Information**

Before we start altering a DataFrame’s columns, we need to know what the columns are and what kinds of data they contain. We can display a summary table of column information using the .info() method:

In [195]:
import pandas as pd
df = pd.read_csv('filename.csv')
df.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Player Name            10 non-null     object 
 1   Games Played           10 non-null     int64  
 2   Points Per Game        10 non-null     float64
 3   Assists Per Game       10 non-null     float64
 4   Rebounds Per Game      10 non-null     float64
 5   Steals Per Game        10 non-null     float64
 6   Field Goal Percentage  10 non-null     float64
dtypes: float64(5), int64(1), object(1)
memory usage: 688.0+ bytes


Here is what we are seeing with the above output: 
- The first line tells us that this is a DataFrame (.info() can also be called on a Series). 
- The second line tells us that the index of df is a range with 10 entries, from 0 to 9.
- '#' indicates the column index number
- 'Column' refers to the column name
- 'Non-Null Count' is the number of non-missing values in the column
- 'Dtype' is the column’s data type

**Renaming and Removing Columns**

There will be instances where we might want to rename a column in our dataframe to make it something more meaningful.  To do so, we would use the following syntax:

In [196]:
column_mapper = {
  'Points Per Game': 'PPG', 
}
df = df.rename(
  mapper = column_mapper,
  axis = 1) 

Let's break down the above:

- first, we defined a variable column_mapper and specify old_column_name : new_column_name
- rename() is a method in pandas used for renaming rows or columns of a dataframe
- mapper is a parameter of the rename() method used to pass the mapping dictionary that specifies the old names and their corresponding new names (in this case, these were passed in with our column_mapper variable) 
- axis is another parameter of the rename() method that specifies the axis along which to rename.  The value 1 is used for columns and 0 would be used for renaming rows

**Removing Columns**

If a dataset has columns that are irrelevant, we can use the .drop() method to remove them:

In [197]:
drop_columns = ['Assists Per Game', 'Rebounds Per Game']
df = df.drop(
  labels=drop_columns, 
  axis=1)

print(df)

  Player Name  Games Played   PPG  Steals Per Game  Field Goal Percentage
0    Player 1            60  19.6              1.9                   58.6
1    Player 2            53  28.8              1.0                   46.0
2    Player 3            67  27.0              1.1                   42.0
3    Player 4            79  27.6              1.8                   57.5
4    Player 5            58  18.0              2.5                   44.6
5    Player 6            81  28.3              1.7                   53.5
6    Player 7            54  24.7              2.8                   51.3
7    Player 8            67  11.6              1.8                   54.0
8    Player 9            78  10.3              1.3                   40.3
9   Player 10            81   5.5              2.5                   53.9


**Rounding Numbers**

In Python, we can use the round() function to round to a specified number of decimals:

In [198]:
round(12.5472837, 2)

12.55

The first argument we give the round function is the number we want to round.  The second argument tells it how many decimal places we want to round it to

**Column Calculations in Pandas**

In Pandas, computations involving columns of df will be applied to all rows simultaneously. For example, the following code using Python’s division operator / will divide Games Played in half:


In [199]:
df['Games Played'] / 2

0    30.0
1    26.5
2    33.5
3    39.5
4    29.0
5    40.5
6    27.0
7    33.5
8    39.0
9    40.5
Name: Games Played, dtype: float64

We can also create a new column to assign the values to:

In [200]:
df['Half of Games Played'] = df['Games Played'] / 2

print(df['Half of Games Played'])

0    30.0
1    26.5
2    33.5
3    39.5
4    29.0
5    40.5
6    27.0
7    33.5
8    39.0
9    40.5
Name: Half of Games Played, dtype: float64


If we want, we could overwrite the exising column with the new values (would look something like df['Col2'] = df['Col2'] / 2)

**Combining Columns**

We can combine columns together like so:

In [201]:
df['newcolumn'] = df['PPG'] + df['Games Played']

df['newcolumn']

0     79.6
1     81.8
2     94.0
3    106.6
4     76.0
5    109.3
6     78.7
7     78.6
8     88.3
9     86.5
Name: newcolumn, dtype: float64

We can use our round() method we learned on the column as well:

In [202]:
df['newcolumn'] = df['newcolumn'].round(decimals=0) #rounding to a whole number

df['newcolumn']

0     80.0
1     82.0
2     94.0
3    107.0
4     76.0
5    109.0
6     79.0
7     79.0
8     88.0
9     86.0
Name: newcolumn, dtype: float64

**Splitting and Combining Columns**

Let's separate a column that has city and state together in the same column separated by a comma:


In [203]:
locations = pd.read_csv('Locations.csv')

In [204]:
locations_split = locations['Location'].str.split(
    pat=',',
    expand=True)


To explain the above syntax:

- .str tells pandas we are treating the entries of the Location column as strings in Python
- .split() is a string method for splitting up strings based on delimiters (what separates the values) which in this case would be a comma
- The pat parameter specifies the delimiter that splits the string (this could be any pattern, including whitespace '')
- expand parameter with the True argument returns a dataframe with the strings split into columns

In [205]:
locations['City'] = locations_split[0]
locations['State'] = locations_split[1]

locations

Unnamed: 0,Location,City,State
0,"New York, NY",New York,NY
1,"Los Angeles, CA",Los Angeles,CA
2,"Chicago, IL",Chicago,IL
3,"Houston, TX",Houston,TX
4,"Phoenix, AZ",Phoenix,AZ
5,"Philadelphia, PA",Philadelphia,PA
6,"San Antonio, TX",San Antonio,TX
7,"San Diego, CA",San Diego,CA
8,"Dallas, TX",Dallas,TX
9,"San Jose, CA",San Jose,CA


In [206]:
locations = locations.drop(
    labels='Location',
    axis=1
)

locations

Unnamed: 0,City,State
0,New York,NY
1,Los Angeles,CA
2,Chicago,IL
3,Houston,TX
4,Phoenix,AZ
5,Philadelphia,PA
6,San Antonio,TX
7,San Diego,CA
8,Dallas,TX
9,San Jose,CA


If we want to combine two columns together:

In [207]:
locations['Combined'] = locations['City'].str.cat(
    locations['State'],
    sep=','
)

locations

Unnamed: 0,City,State,Combined
0,New York,NY,"New York, NY"
1,Los Angeles,CA,"Los Angeles, CA"
2,Chicago,IL,"Chicago, IL"
3,Houston,TX,"Houston, TX"
4,Phoenix,AZ,"Phoenix, AZ"
5,Philadelphia,PA,"Philadelphia, PA"
6,San Antonio,TX,"San Antonio, TX"
7,San Diego,CA,"San Diego, CA"
8,Dallas,TX,"Dallas, TX"
9,San Jose,CA,"San Jose, CA"


For the above syntax:

- .cat() method places the text in State after the text in City (concatenation)
- the separator sep=',' specifies how we want to separate the values 

We can use .lower(), .upper(), and .title() methods to change the case of the letters (need to use the .str method before the desired method you want to use):

In [208]:
locations['City'].str.upper()

0        NEW YORK
1     LOS ANGELES
2         CHICAGO
3         HOUSTON
4         PHOENIX
5    PHILADELPHIA
6     SAN ANTONIO
7       SAN DIEGO
8          DALLAS
9        SAN JOSE
Name: City, dtype: object

We can also use the .strip() method to remove whitespace in a string (only works on leading and trailing whitespace, not delimiter whitespace):

In [209]:
locations['City'].str.strip()

0        New York
1     Los Angeles
2         Chicago
3         Houston
4         Phoenix
5    Philadelphia
6     San Antonio
7       San Diego
8          Dallas
9        San Jose
Name: City, dtype: object

There is also a .replace() method to replace a specified character pattern with another one:

In [210]:
locations['Combined'] = locations['Combined'].str.replace(
    pat=',',
    repl=';',
    regex=False 
)

locations

Unnamed: 0,City,State,Combined
0,New York,NY,New York; NY
1,Los Angeles,CA,Los Angeles; CA
2,Chicago,IL,Chicago; IL
3,Houston,TX,Houston; TX
4,Phoenix,AZ,Phoenix; AZ
5,Philadelphia,PA,Philadelphia; PA
6,San Antonio,TX,San Antonio; TX
7,San Diego,CA,San Diego; CA
8,Dallas,TX,Dallas; TX
9,San Jose,CA,San Jose; CA


To change the data type of a column:

In [211]:
locations['State'] = locations['State'].astype('object')