In [None]:
# Before anything, let's import our libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# An Introduction to NumPy and Pandas
Tanner Bonner | January 2023

# What is NumPy?
Read more at https://numpy.org/doc/stable/user/whatisnumpy.html
### Summary from their website
“NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, basic linear algebra, basic statistical operations, random simulation and much more.”
### Fundamental object: the ndarray ("NumPy array")
A list (+ of lists of lists…) sort of object. Some key components of ndarray versus a typical Python list:
* All elements are the same data type, and thus the same size in memory
 * e.g. can't mix strings and integers together, unlike Python lists
* Fixed size in memory at creation, thus consuming less memory to store data
 * unlike Python lists, which follow a dynamic array model, leaving extra space at creation to grow/shrink as needed
 
### The significance of NumPy arrays
* Facilitates operations on large numbers of data more efficiently and with less code than typical Python
 * Allows “vectorized code” - no explicit looping or indexing, which happens behind the scenes in optimized, pre-compiled C code.
 * Easier to read, fewer lines of code, and more closely resembles mathematical notation
* Many scientific and mathematical Python libraries are built upon NumPy arrays, converting Python lists to NumPy arrays in pre-processing and outputting NumPy arrays

An interesting quote: “One needs to know how to use NumPy arrays to efficiently use much of today’s scientific/mathematical Python-based software." Only knowing how to use Python’s built-in sequence types is deemed “insufficient” in the world of Python big data processing to achieve reasonable runtimes and memory usage.

In this notebook, we'll explore only a portion of the functionality provided by NumPy, including:
* Creating NumPy Arrays
* Investigating Properties of NumPy Arrays
* Arithmetic with NumPy Arrays
* Indexing and Filtering NumPy Arrays
* Basic Statistics with NumPy Arrays

## Creating NumPy Arrays - Some General Mechanisms
See more at https://numpy.org/doc/stable/user/basics.creation.html

#### 1: Conversion from Python structures (i.e. lists and tuples) - use np.array(data)

From a list of integers:

In [None]:
sample_list = [1, 2, 3, 4]
oned_array = np.array(sample_list)
print(type(oned_array))
oned_array

From a list of strings, converting to integers with dtype=int:

In [None]:
sample_list_2 = ['1', '2', '3', '4']
oned_array_2 = np.array(sample_list_2, dtype=int)
oned_array_2

From a list of strings, converting to decimal ('floating point') values:

In [None]:
sample_list_3 = ['1.5', '2.0', '2.5', '3.0']
oned_array_3 = np.array(sample_list_3, dtype=float)
oned_array_3

From a list of lists of integers (a '2D' array):

In [None]:
sample_list_2 = [[1, 2], [3, 4]]
twod_array = np.array(sample_list_2)
twod_array

From a tuple:

In [None]:
sample_tuple = (1, 2, 3, 4)
array_2 = np.array(sample_list)
array_2

From a mixed data structure:

In [None]:
sample_data = [(1,), (2,), (3,)]
array_3 = np.array(sample_data)
array_3

#### 2: Intrinsic and general NumPy array creation functions

np.linspace(start, stop, num) - generate evenly spaced numbers (default floating point) within a specified range (inclusive)

In [None]:
# E.g.: generate all integers from 1 to 10
ex_1 = np.linspace(start=1, stop=10, num=10, dtype=int)
ex_1

np.arange(start, stop, step) - generate evenly spaced numbers, separated by a specified step, within a specified range (start, stop-1) (similar to Python's range() function, and unlike np.linspace())

In [None]:
# e.g.: generate all even integers from 2 to 10 (note stop must be end + 1)
ex_2 = np.arange(start=2, stop=11, step=2) 
ex_2

np.ones(size) - generate an array of ones (default floating point) of a specified length

In [None]:
# e.g.: generate ten 1s of integer value
ex_3 = np.ones(10, dtype=int)
ex_3

np.zeros(size) - generate an array of zeros (default floating point) of a specified length

In [None]:
# e.g.: generate ten 0s of integer value
ex_4 = np.zeros(10, dtype=int)
ex_4

np.random.randint(low, high, size) -  generate integers randomly (uniform distribution) from a range (start, stop-1) of a specified length

In [None]:
# e.g.: generate 10 integers between 1 and 5
ex_5 = np.random.randint(low=1, high=6, size=10)
ex_5

np.random.random_sample(size) - generate floats randomly (uniform) from the interval [0.0, 1.0)

In [None]:
# e.g.: generate 10 floats in the range [0.0, 1.0)
ex_6 = np.random.random_sample(size=10)
ex_6

#### 3: Replicating or joining existing NumPy arrays

Replicating from indexing:

In [None]:
ex_7 = np.array([1, 2, 3, 4])
ex_8 = ex_7[:]
ex_8

Concatenating/joining using np.block([]):

In [None]:
# e.g. join array ex_8 to the end of ex_7
ex_9 = np.block([ex_7, ex_8]) # Note list input and order
ex_9

#### 4: Reading arrays from disk

Common method is to use np.genfromtxt(filename, delimiter, dtype). Another method is np.loadfromtxt(filename, delimiter, dtype), but np.genfromtxt provides additional functionality (such as missing_values, filling_values params)

In [None]:
from_csv = np.genfromtxt("sample-data.csv", delimiter=",", dtype=float)
from_csv

Small caveat: make sure your .CSV file is UTF-8 and not UTF-8-BOM (which might be generated by Excel) - if it is UTF-8-BOM, the first element will be "na" instead of the desired value. You can change this by opening the .CSV in Notepad++.

## Investigating Properties of NumPy Arrays

.shape - returns a tuple specifying the size along each array dimension

In [None]:
# .shape - the size along each array dimension; returns a tuple
from_csv.shape

In [None]:
num_entries = from_csv.shape[0]
num_entries

.ndim - the number of dimensions of the array

In [None]:
# .ndim - the number of dimensions of the array
from_csv.ndim

## Arithmetic with NumPy Arrays

In typical Python, to add a number to an entire list or to multiply an entire list by a value, you must iterate through and replace all of the values as shown below.

In [None]:
example_list = [1, 2, 3, 4]
add_by = 1
for i in range(len(example_list)):
    example_list[i] += add_by
example_list

In [None]:
multiply_by = 3
for i in range(len(example_list)):
    example_list[i] *= multiply_by
example_list

Using NumPy, you can do these operations in one line.

In [None]:
example_list = [1, 2, 3, 4]
add = 1
ex_array = np.array(example_list)
ex_array = ex_array + add
ex_array

In [None]:
multiply = 3
ex_array = ex_array * multiply
ex_array

You can also do element-wise arithmetic between two arrays much easier. In typical Python, it would look like this:

In [None]:
example_list = [1, 2, 3, 4]
example_list_2 = [3, 4, 5, 6]
example_list_3 = []
for i in range(len(example_list_2)):
    example_list_3.append(example_list[i]+example_list_2[i])
example_list_3

Using NumPy, the for loop and indexing is simplified into a single operation:

In [None]:
ex_array = np.array([1, 2, 3, 4])
ex_array_2 = np.array([3, 4, 5, 6])
ex_array_3 = ex_array + ex_array_2
ex_array_3

## Indexing and Filtering NumPy Arrays

Splicing - [start:stop:step] - works similarly to Python lists

In [None]:
ex_array = np.array([1, 2, 3, 4, 5, 6])
# e.g. retrieve the third and fourth elements
ex_array[2:4]

In [None]:
# e.g. retrieve the first, third, and fifth elements
ex_array[0:6:2] 

Filter based on conditional "true/false" evaluation

In [None]:
# e.g. retrieve all values greater than or equal to 4
ex_array[ex_array >= 4]

In [None]:
# e.g. retrieve all values greater than 1 but less than 4
ex_array[(ex_array > 1) & (ex_array < 4)] # use "&" for "and" and "|" for "or"

In [None]:
# e.g. retrieve all values less than 2 or greater than 4
ex_array[(ex_array < 2) | (ex_array > 4)]

## Basic Statistics with NumPy Arrays

NumPy array methods - min() and max()

In [None]:
ex_array = np.array([1, 4, 6, 3, 5, 3, 7, 3, 2, 1, 3, 2])
print("Minimum: " + str(ex_array.min()))
print("Maximum: " + str(ex_array.max()))

NumPy functions - np.percentile(arr, percent), np.median(arr), np.mean(arr), np.std(arr), np.var(arr), np.sum(arr)

In [None]:
print('1st Quartile: ' + str(np.percentile(ex_array, 25)))
print('Median: ' + str(np.median(ex_array)))
print('Mean: ' + str(np.mean(ex_array)))
print('Standard Deviation: ' + str(np.std(ex_array)))
print('Variance: ' + str(np.var(ex_array)))
print('Sum of Values: ' + str(np.sum(ex_array)))

# What is Pandas?
Read more at https://en.wikipedia.org/wiki/Pandas_(software) 
### Summary from Wikipedia
Pandas is a Python library for data manipulation and analysis, offering data structures and operations for manipulating tables and time series data. Its name is a play on the phrase "Python data analysis" and is derived from the term "panel data" - an econometrics term for data sets that include observations over multiple periods for the same individuals. It is built upon the NumPy library.

### Fundamental object: the DataFrame
The DataFrame represents a two-dimensional data table ("tabular" data) with rows that have a given 'index' (or identifier), and columns. Columns can be different data types from eachother. Think of DataFrames as spreadsheets with a fixed number of rows and columns, where each row represents an "entry" of the data set, and each column specifies some sort of variable or attribute (if the dataset is tidy as-is).

### The significance of Pandas
Similarly to NumPy, there are many operational benefits to using Pandas as a tool for data analysis versus typical Python. Along with benefits stemming from optimization, there are a suite of tools that allow one to parse and clean data sets in an easy way. A few of the benefits are listed below:
* Many inbuilt methods available for fast data manipulation made possible with vectorisation
* Data alignment and integrated handling of missing data
* Label-based slicing, fancy indexing, and subsetting of large datasets
* Data structure column insertion and deletion
* Data set merging and joining
* Data filtration
* Data filling
* Statistical analysis

In the examples below, we'll start with a basic DataFrame and then take a look at Freight Analysis Framework 5 data for 2017 and 2050, downloaded from https://faf.ornl.gov/faf5/dtt_total.aspx. The portions of Pandas functionality that we'll explore in this notebook includes:
* Creating a DataFrame
* Investigating Properties of DataFrames
* Organizing and Cleaning Data in DataFrames
* Sorting, Iterating, and Aggregation in DataFrames

## Creating a DataFrame - Some General Mechanisms

#### 1: Conversion from Python data structures - primary method: pd.DataFrame(data, index, columns) 

From a list:

In [None]:
names = ['Tanner', 'Rose', 'Joe', 'Sophie']
df = pd.DataFrame(names, columns=['first_name'])
df

From a dictionary containing aligned lists -  keys = columns, lists with entries = rows

In [None]:
people = { 'first_name': names, 
          'last_name': ['Bonner', 'McCarron', 'Delorto', 'Fox'], 
          'role': ['Analyst', 'Manager', 'Analyst', 'Analyst']}
df = pd.DataFrame(people)
df

#### 2: Reading tables from disk

Some options include pd.read_csv(filename) and pd.read_excel(filename)

In [None]:
faf_data = pd.read_csv('faf-numpy-pandas-testing.csv')
faf_data

## Investigating Properties of DataFrames

.head(num_rows) will print out the first num_rows of a DataFrame (by default, num_rows is 5).

In [None]:
faf_data.head()

.columns returns a list of all of the column names of the DataFrame

In [None]:
# Get # of columns and all of the column names
print('Number of columns: ' + str(len(faf_data.columns)))
print('Column names:')
for column in faf_data.columns:
    print(column)

.shape (recall from NumPy) returns a tuple of (rows, columns) for the DataFrame

In [None]:
print(faf_data.shape)
print("Number of Rows: " + str(faf_data.shape[0]))

.size returns the number of rows x columns (total data 'entries')

In [None]:
faf_data.size

.dtypes returns the data types of each column - note 'object' is string-like

In [None]:
faf_data.dtypes

## Organizing and Cleaning Data in DataFrames

#### First, let's replace some of these column names with something more readable.

We'll use the method df.rename(dict, inplace=True) where dict is a dictionary mapping original to new column names.

Most operations occur as a copy of the DataFrame, thus in most cases you must reassign the variable or specify 'inplace = True' when applicable.

In [None]:
# Rename columns - provide a dictionary of original:new column names
columns_rename = {
    'dms_orig': 'Origin',
    'dms_dest': 'Destination',
    'sctg2': 'Commodity Type',
    'dms_mode': 'Mode',
    'thousand tons in 2017': 'Thousand Tons (2017)',
    'thousand tons in 2050': 'Thousand Tons (2050)'
}
faf_data.rename(columns=columns_rename, inplace = True)

In [None]:
# Much better!
for column in faf_data.columns:
    print(column)

#### Now, let's check for any N/A values. To do this, we will use the df.isna() function, which will return a dataframe of True/False values for each data point in our dataframe.

In [None]:
is_na = faf_data.isna()
print(is_na)

We can do a column-by-column search of any N/A values as well, finding which entries have an N/A value by using filtering and .index to retrieve such indices where the condition is true.

In [None]:
for column in faf_data.columns:
    is_na_in_column = faf_data[column].isna()
    print(faf_data[is_na_in_column == True].index)

In this case, we have no N/A values (yay!).

#### Next, let's replace the 'Commodity Type' and 'Mode' columns to remove their original "x[y]-" prefixes. We will do this by mapping the original values to our desired values, similar to replacing the column names above.

First, we will get the unique values from the original column values. We'll do this by using the .unique() method on the column.

In [None]:
orig_comm_vals = faf_data["Commodity Type"].unique()
orig_comm_vals

In [None]:
orig_mode_vals = faf_data["Mode"].unique()
orig_mode_vals

Second, we will create lists of the replacement values, where each value is a sliced version of itself, removing the prefix as desired.

In [None]:
new_comm_vals = [val[3:] for val in orig_comm_vals] # Slicing off the prefix of each value
new_comm_vals

In [None]:
new_mode_vals = [val[2:] for val in orig_mode_vals] # Slicing off the prefix of each value
new_mode_vals

Next, we set up dictionaries mapping the original values to the replacement values. To help us do this, we'll use "zip" which pairs elements from distinct iterable data structures together.

In [None]:
comm_replace = { orig: new for orig, new in zip(orig_comm_vals, new_comm_vals)}
comm_replace

In [None]:
mode_replace = { orig: new for orig, new in zip(orig_mode_vals, new_mode_vals)}
mode_replace

Lastly, we replace the values in each column using our dictionary mappings that we created. We will use the form .replace(values), where values is a dictionary.

In [None]:
faf_data['Commodity Type'] = faf_data['Commodity Type'].replace(comm_replace)
faf_data['Mode'] = faf_data['Mode'].replace(mode_replace)

Note: you could also use the following syntax with inplace=True to change the values. However, to be safe, it might be best to explicitly change the values per-column, as it could lead to unwanted results depending upon our data. 
* When might we have an unwanted result? (hint: it wouldn't happen with this dataset)

In [None]:
faf_data.replace(comm_replace, inplace=True)
faf_data.replace(mode_replace, inplace=True)

Checking our results by looking at the value set in each of the replaced columns:

In [None]:
faf_data['Mode'].unique()

In [None]:
faf_data['Commodity Type'].unique()

#### Let's create a new column in our DataFrame representing the difference in tons from 2017 to 2050. 

To create a new column, we index on our new column name and assign its column-wise calculation as follows, where arithmetic is performed in a NumPy fashion.

In [None]:
faf_data['Difference in Tons (2017 to 2050)'] = (faf_data['Thousand Tons (2050)'] - faf_data['Thousand Tons (2017)']) * 1000

#### We'd like to know more information specifically about freight movement via trucks. To start, we can create a new DataFrame with only entries where the 'Mode' is 'Truck'.

To filter, we'll index the DataFrame on the condition desired and make a copy using .copy() to prevent warnings from Pandas.

In [None]:
faf_trucks = faf_data[faf_data['Mode'] == 'Truck'].copy()

Checking our results by seeing all unique values for the 'Mode' column in our new DataFrame:

In [None]:
faf_trucks['Mode'].unique()

So what's going on here? 
Recall how Pandas is built upon NumPy, and in NumPy, operations like arithmetic and comparisons are done element-wise across an entire array. By specifying "faf_data['Mode'] == 'Truck'", we retrieve a Pandas "bool" type, in this case a one-dimensional ordered list of True/False values over all of the entries from comparing its 'Mode' value to 'Truck'. This is then used to filter the DataFrame to retrieve only entries where the evaluation is True - that is, its 'Mode' is 'Truck'.

In [None]:
print(faf_data['Mode'] == 'Truck')

#### Let's add some new columns to our DataFrame estimating the number of trucks for each year, 2017 and 2050. The FTA recommends a conversion factor of 20 tons per truck.

In [None]:
faf_trucks['Trucks (2017)'] = (faf_trucks['Thousand Tons (2017)'] / 20) * 1000
faf_trucks['Trucks (2050)'] = (faf_trucks['Thousand Tons (2050)'] / 20) * 1000
faf_trucks.head()

Now, it's somewhat awkward considering there might be "469.120" trucks. Let's change the data type of our trucks columns to be integers, rather than floats. We'll do this with the .astype() method.

In [None]:
faf_trucks['Trucks (2017)'] = faf_trucks['Trucks (2017)'].astype(int)
faf_trucks['Trucks (2050)'] = faf_trucks['Trucks (2050)'].astype(int)

Checking our results with .dtypes:

In [None]:
print(faf_trucks.dtypes)
faf_trucks.head()

Finally, let's create a column specifying the difference in estimated trucks from 2017 to 2050 and check that its the proper data type. Since we're only checking one column here, we'll use .dtype instead of .dtypes

In [None]:
faf_trucks['Difference in Trucks (2017 to 2050)'] = faf_trucks['Trucks (2050)'] - faf_trucks['Trucks (2017)']
print(faf_trucks['Difference in Trucks (2017 to 2050)'].dtype)
faf_trucks.head()

## Sorting, Iterating, and Aggregation in DataFrames

#### Suppose we'd like to know the total trucks estimated for 2017 and 2050, and the total difference.

We can use the .sum() method to find this out.

In [None]:
total_trucks_2017 = faf_trucks['Trucks (2017)'].sum()
print("Total Trucks (2017): " + "{:,}".format(total_trucks_2017))

In [None]:
total_trucks_2050 = faf_trucks['Trucks (2050)'].sum()
print("Total Trucks (2050): " + "{:,}".format(total_trucks_2050))

In [None]:
total_difference_2017_2050 = faf_trucks['Difference in Trucks (2017 to 2050)'].sum()
print("Difference in Trucks (2017 to 2050): " + "{:,}".format(total_difference_2017_2050))

#### Suppose we want to know the top 10 commodity types for difference in trucks from 2017 to 2050, at both the lowest and highest extremes.

To find this out, we can use the .sort_values() method, indexing, and iterating through the entries using the .iterrows() method. .iterrows() returns an iterable of index, entry pairings.

In [None]:
# Slicing for first 10 entries with [:10]
bot_10_difference = faf_trucks.sort_values(by='Difference in Trucks (2017 to 2050)', ascending=True)[:10]

In [None]:
print("Bottom 10 Differences in Trucks from 2017 to 2050 by Commodity Type")
for index, entry in bot_10_difference.iterrows():
    print(entry['Commodity Type'] + ", " + "{:,}".format(entry['Difference in Trucks (2017 to 2050)']))

In [None]:
top_10_difference = faf_trucks.sort_values(by='Difference in Trucks (2017 to 2050)', ascending=False)[:10]

In [None]:
print("Top 10 Differences in Trucks from 2017 to 2050 by Commodity Type")
for index, entry in top_10_difference.iterrows():
    print(entry['Commodity Type'] + ", " + "{:,}".format(entry['Difference in Trucks (2017 to 2050)']))

#### Let's find out what commodity types and modes have the largest total differences in tons from 2017 to 2050.

Because each commodity type and mode appears multiple times throughout the original DataFrame, to retrieve this aggregate information, we must use the .groupby(by=column) method. .groupby(by=column) returns a pairwise iterable of the unique values in the column with its associated subset DataFrames with all entries that share that same column value.

We'll start witih commodity types.

In [None]:
groups_commodity = faf_data.groupby(by="Commodity Type") # Returns an iterable of DataFrames, paired with name of commodity
difference_comms = [] # Set up initial list of tuples for [(value, name),...], where value comes first to be sorted on.
for name, group in groups_commodity:
    print(name)
    print(group.head())
    comm_difference = group["Difference in Tons (2017 to 2050)"].sum() # Sum of values for this subset DataFrame
    difference_comms.append((comm_difference, name))
difference_comms = sorted(difference_comms)

Now we'll index for our bottom 10 and top 10 commodity difference values and print out our results.

In [None]:
bot_10_comm_diff = difference_comms[:10]
top_10_comm_diff = difference_comms[::-1][:10] # [::-1] reverses list (walking backwards), providing list in descending order

In [None]:
print("Bottom 10 Differences in Tons from 2017 to 2050 by Commodity Type")
for diff, name in bot_10_comm_diff:
    print(name + ": " + "{:,}".format(round(diff, 3)))

In [None]:
print("Top 10 Differences in Tons from 2017 to 2050 by Commodity Type")
for diff, name in top_10_comm_diff:
    print(name + ": " + "{:,}".format(round(diff, 3)))

Repeating the .groupby() to find out the total differences in modes.

In [None]:
groups_mode = faf_data.groupby(by="Mode") # Returns an iterable of DataFrames, paired with name of commodity
mode_differences = []

for name, group in groups_mode:
    mode_difference = group["Difference in Tons (2017 to 2050)"].sum()
    mode_differences.append((mode_difference, name))
mode_differences = sorted(mode_differences)[::-1] # Descending order

In [None]:
print("Differences in Tons from 2017 to 2050 by Mode")
for diff, name in mode_differences:
    print(name + ": " + "{:,}".format(round(diff, 3)))

## Appendix: Basic Visualization with Matplotlib.pyplot and Pandas

#### First let's plot a horizontal bar chart visualizing the difference in tons from 2017 to 2050 by commodity type. We'll use Matplotlib's direct functionality for this because we have the data available in list form.

Using the list of (tons, commodity) values generated from before, we'll call Matplotlib's .barh(y, x) method and related helper methods to edit the plot as desired. In practice, this takes some back-and-forth with documentation in order to achieve a "pretty enough" chart.

In [None]:
# Set up the plot with desired figure size (width, height)
fig, ax = plt.subplots(figsize=(11, 20))
ax.set_title("Difference in Tons (2017 to 2050) by Commodity Type")
# Plot each bar with ax.barh(y, x)
for data, comm in difference_comms:
    ax.barh(comm, data, alpha=0.7)
# Add labels specifying each value from ax.containers
for container in ax.containers:
    ax.bar_label(container, padding=2, fontsize=10, labels=[f'{x:,.0f}' for x in container.datavalues])
# Extend the left of the chart for label fitting by inserting at the front
x_ticks = ax.get_xticks()
x_ticks = np.insert(x_ticks, 0, 2*x_ticks[0])
ax.set_xticks(x_ticks)
# Put a grid over the chart
ax.grid(alpha=0.5)
# Add labels along each axis
ax.set_ylabel("Commodity Type")
ax.set_xlabel("Tons (10s of Millions)")

#### Next let's similarly plot a horizontal bar chart for difference in tons from 2017 to 2050 by mode.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Difference in Tons (2017 to 2050) by Mode")
for data, mode in mode_differences[::-1]:
    ax.barh(mode, data, alpha=0.5)
for container in ax.containers:
    ax.bar_label(container, padding=2, fontsize=10, labels=[f'{x:,.0f}' for x in container.datavalues])
x_ticks = ax.get_xticks()
## Extend the right-end of the ticks to fit our label
x_ticks = np.append(ax.get_xticks(), [x_ticks[-1]+x_ticks[1]])
ax.set_xticks(x_ticks)
ax.grid(alpha=0.5)
ax.set_ylabel("Mode")
ax.set_xlabel("Tons (10s of Millions)")

#### Let's also plot a grouped bar chart showing the trucks in 2017 versus trucks in 2050 by commodity type.

This can be done with Pandas directly from the DataFrame, which uses plotting functionality from Matplotlib with some shorter syntax.

Since the previous lists we used for difference in tons for commodity type and mode were already in order, we will first need to sort the entries of the DataFrame, retrieve their indices in order with .index, and re-order the entries in a new DataFrame with .reindex(indices)

In [None]:
in_order = faf_trucks["Trucks (2050)"].sort_values().index
faf_trucks_2 = faf_trucks.reindex(in_order)

In [None]:
# Plot the DataFrame directly using .plot(), and retrieve the figure object for further editing
ax = faf_trucks_2.plot(x='Commodity Type', y=['Trucks (2017)', 'Trucks (2050)'], kind='barh', figsize=(10, 30), alpha=0.7)
ax.set_title("Trucks in 2017 vs. 2050 by Commodity Type")
for container in ax.containers:
    ax.bar_label(container, padding=2, fontsize=10, labels=[f'{x:,.0f}' for x in container.datavalues])
x_ticks = ax.get_xticks()
x_ticks = np.append(ax.get_xticks(), [x_ticks[-1]+x_ticks[1]])
ax.set_xticks(x_ticks)
ax.grid(alpha=0.5)
ax.legend(loc="center right")

In [None]:
in_order = faf_trucks["Difference in Trucks (2017 to 2050)"].sort_values().index
faf_trucks_2 = faf_trucks.reindex(in_order)

In [None]:
ax = faf_trucks_2.plot(x='Commodity Type', y='Difference in Trucks (2017 to 2050)', kind="barh", figsize=(10, 30), alpha=0.7)
ax.set_title("Difference in Trucks from 2017 to 2050 by Commodity Type")
for container in ax.containers:
    ax.bar_label(container, padding=2, fontsize=10, labels=[f'{x:,.0f}' for x in container.datavalues])
x_ticks = ax.get_xticks()
x_ticks = np.append(ax.get_xticks(), [x_ticks[-1]+x_ticks[1]])
x_ticks = np.insert(x_ticks, 0, [(x_ticks[0]-x_ticks[1])*3])
ax.set_xticks(x_ticks)
ax.grid(alpha=0.5)
ax.legend(loc="center right")