Data
------------------------------
First we need to go ahead and load in the cars csv file into AMPL, the process is done using pandas. We tried using AMPL's [best practices](https://dev.ampl.com/ampl/best-practices/amplpy-best-practices.html) for loading in the data. 

Now first let's go ahead and load in all of our packages while also making sure we are in the parent directory to access all of the files. 


In [1]:
import pandas as pd 
import numpy as np
import os
from amplpy import AMPL
os.chdir("..")
os.getcwd()


'/Users/kevin/Documents/GitHub/Math5593LinearProgrammingProject'

Now lets take a look at the head of the data while also doing some summary statistics. 

In [2]:
cars = pd.read_csv("data/mtcars.csv", index_col = 0)
cars

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [3]:
cars.describe()

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
count,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0
mean,20.090625,6.1875,230.721875,146.6875,3.596563,3.21725,17.84875,0.4375,0.40625,3.6875,2.8125
std,6.026948,1.785922,123.938694,68.562868,0.534679,0.978457,1.786943,0.504016,0.498991,0.737804,1.6152
min,10.4,4.0,71.1,52.0,2.76,1.513,14.5,0.0,0.0,3.0,1.0
25%,15.425,4.0,120.825,96.5,3.08,2.58125,16.8925,0.0,0.0,3.0,2.0
50%,19.2,6.0,196.3,123.0,3.695,3.325,17.71,0.0,0.0,4.0,2.0
75%,22.8,8.0,326.0,180.0,3.92,3.61,18.9,1.0,1.0,4.0,4.0
max,33.9,8.0,472.0,335.0,4.93,5.424,22.9,1.0,1.0,5.0,8.0


Now lets go ahead and load in the data into something that is workable within our AMPL project. The way to do this is by first noticing that our .mod file has two sets one for observations and one for coefficients. Thus, in order to load in the data we first need it in wide format where we should have something like the following:

| Individual I | Variable J | Value |
|--------------|------------|-------|
| Mazda RX-4   | MPG        | 21    |
| Mazda RX-4   | Cyl        | 6     |

Then from this point it is pretty simple to load in the data into AMPL with our .setdata command. 


In [4]:
# Getting data into long format, lets drop the model column since we 
cars

# Make sure that we know in our index which one is our y and X'
y = cars["mpg"]

# Noticed that we choose only the easy numeric predictors for this example
X = cars[["cyl", "disp", "hp", "wt", "qsec"]]

# Getting our labels for our X's or variables
J = X.columns.tolist()
J

cars_sub = cars[["cyl", "disp", "hp", "wt", "qsec"]]
cars_long = cars_sub.stack().reset_index()
cars_long.columns = ["Car", "Variable", "Value"]

# Since we are using AMPL we need to get rid of vairables with string names
cars_long = cars_long[pd.to_numeric(cars_long["Value"], errors="coerce").notnull()]

cars_long.head(15)


Unnamed: 0,Car,Variable,Value
0,Mazda RX4,cyl,6.0
1,Mazda RX4,disp,160.0
2,Mazda RX4,hp,110.0
3,Mazda RX4,wt,2.62
4,Mazda RX4,qsec,16.46
5,Mazda RX4 Wag,cyl,6.0
6,Mazda RX4 Wag,disp,160.0
7,Mazda RX4 Wag,hp,110.0
8,Mazda RX4 Wag,wt,2.875
9,Mazda RX4 Wag,qsec,17.02


In [5]:

df_car = cars.index.to_frame(name="Model")
df_var = pd.DataFrame({"Variables": ["cyl","disp", "hp", "wt", "qsec"] })
df_y = pd.DataFrame({
    "Car": cars.index.astype(str),   # car NAMES
    "y": cars["mpg"].values
})
df_y_indexed = df_y.set_index("Car")

df_x = cars_long
df_x["Car"] = df_x["Car"].astype(str)
df_x["Variable"] = df_x["Variable"].astype(str)
df_x["Value"] = df_x["Value"].astype(float)

df_x_fixed = df_x.rename(columns={
    "Variable": "Variables",
    "Value": "x"
})

x_dict = df_x_fixed.set_index(['Car', 'Variables'])['x'].to_dict()

print(df_car.head(5))
print(df_var.head(5))
print(df_y_indexed.head(5))
print(df_x.head(5))


                               Model
model                               
Mazda RX4                  Mazda RX4
Mazda RX4 Wag          Mazda RX4 Wag
Datsun 710                Datsun 710
Hornet 4 Drive        Hornet 4 Drive
Hornet Sportabout  Hornet Sportabout
  Variables
0       cyl
1      disp
2        hp
3        wt
4      qsec
                      y
Car                    
Mazda RX4          21.0
Mazda RX4 Wag      21.0
Datsun 710         22.8
Hornet 4 Drive     21.4
Hornet Sportabout  18.7
         Car Variable   Value
0  Mazda RX4      cyl    6.00
1  Mazda RX4     disp  160.00
2  Mazda RX4       hp  110.00
3  Mazda RX4       wt    2.62
4  Mazda RX4     qsec   16.46


Lets Load in The Model
======================

## Sets
- $i \in \text{Car}$
- $j \in \text{Variables}$

---

## Parameters
- $y_i$: response variable  
- $x_{ij}$: predictor matrix  
- $t$: L1 (lasso) budget  

---

## Decision Variables
- $b_j^+ \ge 0$  
- $b_j^- \ge 0$  

for all $j \in \text{Variables}$

---

## Objective Function (Least Squares)

$$
\min_{b^+, b^-} 
\sum_{i \in \text{Car}}
\left( 
y_i - \sum_{j \in \text{Variables}} (b_j^+ - b_j^-) x_{ij}
\right)^2
$$

---

## Constraint (L1 Budget)

$$
\sum_{j \in \text{Variables}} (b_j^+ + b_j^-) \le t
$$


In [6]:
from amplpy import AMPL
from amplpy import DataFrame

# Lets load in our ampl model first
Lasso_Regression = AMPL()
Lasso_Regression.reset()
Lasso_Regression.read("models/L Reg Attempt.mod")

#Making sure we got the correct sets and parameters
print("SETS:")
for s in Lasso_Regression.get_sets():
    print(" -", s)

print("\nPARAMETERS:")
for p in Lasso_Regression.get_parameters():
    print(" -", p)


SETS:
 - ('Car', <amplpy.ampl.Set object at 0x1373d0b80>)
 - ('Variables', <amplpy.ampl.Set object at 0x1373d09a0>)

PARAMETERS:
 - ('y', <amplpy.ampl.Parameter object at 0x1373d0b30>)
 - ('x', <amplpy.ampl.Parameter object at 0x1373d0b80>)
 - ('t', <amplpy.ampl.Parameter object at 0x1373d0b30>)


Loading in The Data
----------------
Now here we are actually loading in our data into the model. However, this process is a little difficult since it is hard to dictate exactly what should be a set or a parameter usingthe set_data function. Instead we used the .set and .param function while also using the .get_parameter and .set_value funcitions. 

Loading in Our Sets
-----

In [None]:
#Loading in our cars set
Lasso_Regression.set["Car"] = df_car["Model"].astype(str)

## Checking that our dataset got loaded in correctly
print(Lasso_Regression.get_set("Car").get_values().to_pandas())

#Loading in the variables set
Lasso_Regression.set["Variables"] = df_var["Variables"].astype(str)

## Checking that our dataset got loaded in correctly
print(Lasso_Regression.get_set("Variables").get_values().to_pandas())

Empty DataFrame
Columns: []
Index: [Mazda RX4, Mazda RX4 Wag, Datsun 710, Hornet 4 Drive, Hornet Sportabout, Valiant, Duster 360, Merc 240D, Merc 230, Merc 280, Merc 280C, Merc 450SE, Merc 450SL, Merc 450SLC, Cadillac Fleetwood, Lincoln Continental, Chrysler Imperial, Fiat 128, Honda Civic, Toyota Corolla, Toyota Corona, Dodge Challenger, AMC Javelin, Camaro Z28, Pontiac Firebird, Fiat X1-9, Porsche 914-2, Lotus Europa, Ford Pantera L, Ferrari Dino, Maserati Bora, Volvo 142E]
Empty DataFrame
Columns: []
Index: [cyl, disp, hp, wt, qsec]
                        y
AMC Javelin          15.2
Cadillac Fleetwood   10.4
Camaro Z28           13.3
Chrysler Imperial    14.7
Datsun 710           22.8
Dodge Challenger     15.5
Duster 360           14.3
Ferrari Dino         19.7
Fiat 128             32.4
Fiat X1-9            27.3
Ford Pantera L       15.8
Honda Civic          30.4
Hornet 4 Drive       21.4
Hornet Sportabout    18.7
Lincoln Continental  10.4
Lotus Europa         30.4
Maserati Bora   

Loading in Our Y Parameter (Dependent Variable)
-----

In [None]:
#Now lets load in some of our actual data 
Lasso_Regression.param["y"] = df_y_indexed["y"]
print(Lasso_Regression.get_parameter("y").get_values().to_pandas())

Loading in Our X Parameter (Independent Variables)
-----
Loading in the parameter with two indices for the sets was a difficult task. However, here it is displayed in a manner so that it shows that it is thankfully done right. Where if you go to previous codes for x_dict where you then notice that in order for it to be loaded right into AMPL you needed to set it's indecies within the pandas dataframe first then you can actually run it through the api.

In [8]:
Lasso_Regression.get_parameter("x").set_values(x_dict)
print(Lasso_Regression.get_parameter("x").get_values().to_pandas())

                          x
index0      index1         
AMC Javelin cyl       8.000
            disp    304.000
            hp      150.000
            qsec     17.300
            wt        3.435
...                     ...
Volvo 142E  cyl       4.000
            disp    121.000
            hp      109.000
            qsec     18.600
            wt        2.780

[160 rows x 1 columns]


Loading in Our "t" Parameter
------------
Our t parameter in our model is the L1 regulaization budget so it is just a possitive value. It is done in a way to show how much total absolute coefficent magintude is allowed in our model. Baisacally thelimits the sum of the absolute value variables in our model for the betas. 

Running the Model
======================
So now that we have the model and data loaded into our ampl we can finally start the process of solving the regression. 