### ECE/CS/ISyE 524 &mdash; Introduction to Optimization ###

# Sentiment Adjusted Trading Optimization #

#### Will Geister (wgeister@wisc.edu), Student 2 (email address), Student 3 (email address), and Student 4 (email address)

*****

### Table of Contents

1. [Introduction](#1.-Introduction)
1. [Mathematical Model](#2.-Mathematical-model)
1. [Solution](#3.-Solution)
1. [Results and Discussion](#4.-Results-and-discussion)
  1. [Optional Subsection](#4.A.-Feel-free-to-add-subsections)
1. [Conclusion](#5.-Conclusion)

## 1. Introduction ##

The basic concept behind a stock trading optimizer is fairly simple, and has been gone over in class plenty, what has not been gone over

## 2. Mathematical model ##

A discussion of the modeling assumptions made in the problem (e.g., is it from physics? economics? something else?). Explain the decision variables, the constraints, and the objective function, both in words and in math. Discuss the model type (LP, QP, MIP, etc.). 
For this section you should **assume the reader is familiar with the material covered in class**.

Equations should be formatted in $\LaTeX$ within the IJulia notebook. The internet is full of resources on how to use $\LaTeX$!

Here is an example of an equation:

$$
Ax=b
$$

If the equation is short, you can also write it inline $Ax=b$. This also works for explaining the different symbols. For example, for the previous equation, you would want to include the following explanation: We consider the equation $Ax=b$, where $A$ is a matrix, $x$ is a vector of variables, and $b$ is the right hand side vector. 

Here is an example of how you can write a matrix:

$$
\begin{bmatrix}
  1 & 2 \\
  3 & 4
\end{bmatrix}
\begin{bmatrix} x \\ y \end{bmatrix} =
\begin{bmatrix} 5 \\ 6 \end{bmatrix}
$$

And here is an example of how to typically write an optimization problem. Notice the use of the "aligned" environment, which aligns subsequent equations at the position of the \& signs:

$$
\begin{aligned}
\underset{x \in \mathbb{R}^n}{\text{maximize}}\quad & \mathbf{E}^T x \\
\text{subject to:} \quad & \sum_{i=1}^{n} x_i = 1, \\
& x_i \leq 0.3 \quad \text{for all} \quad i = 1, \dots, n, \\
& x_i \geq 0 \quad \text{for all} \quad i = 1, \dots, n
\end{aligned}
$$
Where:
  - $x_i$ represents the allocation to the $i$-th stock in the portfolio.
  - $\mathbf{E}$ is the vector of expected returns for each stock, i.e., $\mu_{NS}$ from the code.
  - The objective function is to maximize the expected return of the portfolio, $\mathbf{E}^T x$, which is the dot product of the expected return vector and the allocation vector.
  - The first constraint ensures that the sum of the allocations is 1, meaning the portfolio is fully invested.
  - The second constraint limits the maximum allocation to any single stock to 30\% (i.e., $x_i \leq 0.3$).
  - The third constraint ensures that the allocations are non-negative (i.e., $x_i \geq 0$).

Additionally, the code incorporates a risk term related to the portfolio's variance or covariance matrix $\sigma$, where the risk is penalized in the objective function. This can be incorporated as follows:
$$
\text{Objective:} \quad \mathbf{E}^T x - \lambda x^T \sigma x \\
$$


Where:
  - $\sigma$ is the covariance matrix of stock returns.
  - $\lambda$ is the risk aversion parameter (set to 0.5 in the code). The term $\lambda x^T \sigma x$ penalizes the variance of the portfolio, thus balancing between maximizing return and minimizing risk.

However this all constitutes how a normal trade optimizer works, our "sentiment-adjusted" trade optimizer adds a new variable into the mix


## 3. Implementation ##



Here, you should code up your model in Julia + JuMP and solve it. Your code should be clean, easy to read, well annotated and commented, and it should compile! You are not allowed to use other programming languages or DCP packages such as `convex.jl`. **We will be running your code, and you want to make sure that everything works if we run the code blocks in sequence**. Having multiple code blocks separated by text blocks (either as separate cells or blocks of comments) that explain the various parts of your solution will make it much easier for us to understand your project. 

We would also like to see that you make several variants or run different kinds of analysis for your problem (e.g., by changing the input parameters, the constraints or objective, or deriving a dual problem). We expect at least 1-2 such problem variants as part of your project.

**Remember that if you do not write your description of the project and comment your code well, we cannot understand what you have done. Even if it is technically brilliant, you will lose points if you do not write well and comment your code well.**
It's fine to call solvers and external packages we have used in the course such as `Gurobi` or `Plots`, but try to minimize the use of other packages. We want to be able to understand what is happening in your code without looking up additional references. 

In [7]:
using CSV
using DataFrames
using Statistics
using JuMP
using Ipopt

# Load stock data
stock_data = CSV.read("master_stock_data_sectors.csv", DataFrame)
function compute_returns(df)
    df = sort(df, :ticker)
    returns = [missing; diff(df.close) ./ df.close[1:end-1]]
    df[!, :return] = returns
    return df
end
println("Data Read IN")

Data Read IN


In [8]:
returns_data = combine(groupby(stock_data, :ticker), compute_returns) # Compute returns
returns_wide = unstack(returns_data, :ticker, :return; combine=mean)
returns_wide_clean = dropmissing(returns_wide) # Drop rows with missing returns

clean_stocks = names(returns_wide_clean) # Stocks that have all data
valid_stock_data = filter(row -> row[:ticker] in clean_stocks, stock_data)
returns_matrix = Matrix{Float64}(returns_wide_clean[:, clean_stocks])

sigma = cov(returns_matrix) # Compute the covariance matrix for the returns (sigma)
println("Sigma calculated")

Sigma calculated


In [37]:
ticker_sector = CSV.read("tickers_with_sectors.csv", DataFrame)
sector_sentiment = CSV.read("sector_sentiment.csv", DataFrame)

muNS = Float64[]

# lookup dictionaries
ticker_to_sector = Dict(row.ticker => row.sector for row in eachrow(ticker_sector))
sector_to_sentiment = Dict(row.Sector => row.Average_Sentiment_Score for row in eachrow(sector_sentiment))

for stock in clean_stocks
    timepoints = subset(valid_stock_data, :ticker => ByRow(==(stock)))
    expected_return = 0.0
    for time in eachrow(timepoints)
        ret = (time[:open] - time[:close]) / time[:open]
        expected_return += ret
    end
    expected_return /= 100

    if ismissing(expected_return) || isnan(expected_return) || isinf(expected_return) # Bad stock catch
        continue
    end

    push!(muNS, expected_return)
end

extra_elements_needed = size(sigma, 1) - length(muNS)
if extra_elements_needed > 0 # If extra elements are needed, add duplicate entries to mu, manually fix data later
    push!(muNS, repeat([muNS[end]], extra_elements_needed)...)
end

lambda = 0.5
model = Model(Ipopt.Optimizer)
n_stocks = length(muNS)

@variable(model, x[1:n_stocks] >= 0)
@constraint(model, sum(x) == 1)

@constraint(model, [i=1:n_stocks], x[i] <= 0.3)  # Limit: max 30% allocation per stock

@objective(model, Min, -muNS' * x + lambda * (x' * sigma * x))

set_optimizer_attribute(model, "print_level", 0)  # silence IPopt

optimize!(model)

stocks_info = [(clean_stocks[i], value(x[i]), muNS[i]) for i in 1:n_stocks]

# Sort stocks by optimal allocation (descending) for risk categorization
sorted_stocks = sort(stocks_info, by = x -> x[2], rev=true)

# List of tickers to drop <-- this is needed because of empty data slots from our api calls and a generative function that assigns them for unused matrix slots
tickers_to_drop = Set(["open", "close", "high", "low", "volume", "sector", "CCZ"])
filtered_stocks = filter(stock_info -> !(stock_info[1] in tickers_to_drop), sorted_stocks)

# Add sector information to each stock to better understand the sentiment
stocks_with_sectors = [
    (stock[1], stock[2], stock[3], get(ticker_to_sector, stock[1], "Unknown")) 
    for stock in filtered_stocks
]

# Write to CSV, including sector
csv_filename = "top_stocks_no_sentiment.csv"
CSV.write(csv_filename, DataFrame(Stock = [stock[1] for stock in stocks_with_sectors],
                                 Allocation = [stock[2] * 100 for stock in stocks_with_sectors],
                                 ExpectedReturn = [stock[3] * 100 for stock in stocks_with_sectors],
                                 Sector = [stock[4] for stock in stocks_with_sectors]))

println("Results saved to $csv_filename.")


Results saved to top_stocks_no_sentiment.csv.


In [38]:
ticker_sector = CSV.read("tickers_with_sectors.csv", DataFrame)
sector_sentiment = CSV.read("sector_sentiment.csv", DataFrame)

# lookup dictionaries
ticker_to_sector = Dict(row.ticker => row.sector for row in eachrow(ticker_sector))
sector_to_sentiment = Dict(row.Sector => row.Average_Sentiment_Score for row in eachrow(sector_sentiment))

# user-defined weight for sentiment influence
sentiment_weight = 10.0  # ADJUST HERE

muS = Float64[]

for stock in clean_stocks
    timepoints = subset(valid_stock_data, :ticker => ByRow(==(stock)))
    expected_return = 0.0

    for time in eachrow(timepoints)
        ret = (time[:open] - time[:close]) / time[:open]
        expected_return += ret
    end

    expected_return /= 100

    if ismissing(expected_return) || isnan(expected_return) || isinf(expected_return)
        continue
    end

    # Apply sector sentiment adjustment
    sector = get(ticker_to_sector, stock, nothing)
    sentiment_adjustment = 0.0
    if sector !== nothing
        sentiment_adjustment = get(sector_to_sentiment, sector, 0.0) * sentiment_weight
    else
        ## used to have printer for debugger here
    end

    push!(muS, expected_return + sentiment_adjustment)
end

extra_elements_needed = size(sigma, 1) - length(muS)
if extra_elements_needed > 0 # If extra elements are needed, add duplicate entries to mu, manually fix data later
    push!(muS, repeat([muS[end]], extra_elements_needed)...)
end

lambda = 0.5
model = Model(Ipopt.Optimizer)
n_stocks = length(muS)

@variable(model, x[1:n_stocks] >= 0)
@constraint(model, sum(x) == 1)

@constraint(model, [i=1:n_stocks], x[i] <= 0.3)  # Limit: max 30% allocation per stock

@objective(model, Min, -muS' * x + lambda * (x' * sigma * x))

set_optimizer_attribute(model, "print_level", 0)  # silence IPopt

optimize!(model)

stocks_info = [(clean_stocks[i], value(x[i]), mu[i]) for i in 1:n_stocks]

# Sort stocks by optimal allocation (descending) for risk categorization
sorted_stocks = sort(stocks_info, by = x -> x[2], rev=true)


# List of tickers to drop <-- this is needed because of empty data slots from our api calls and a generative function that assigns them for unused matrix slots
tickers_to_drop = Set(["open", "close", "high", "low", "volume", "sector", "CCZ"])
filtered_stocks = filter(stock_info -> !(stock_info[1] in tickers_to_drop), sorted_stocks)

# Add sector information to each stock to better understand the sentiment
stocks_with_sectors = [
    (stock[1], stock[2], stock[3], get(ticker_to_sector, stock[1], "Unknown")) 
    for stock in filtered_stocks
]

# Write to CSV, including sector
csv_filename = "top_stocks_with_sentiment.csv"
CSV.write(csv_filename, DataFrame(Stock = [stock[1] for stock in stocks_with_sectors],
                                 Allocation = [stock[2] * 100 for stock in stocks_with_sectors],
                                 ExpectedReturn = [stock[3] * 100 for stock in stocks_with_sectors],
                                 Sector = [stock[4] for stock in stocks_with_sectors]))

println("Results saved to $csv_filename.")

Results saved to top_stocks_with_sentiment.csv.


## 4. Results and discussion ##

In this section, you display and discuss the results. Show figures, plots, images, trade-off curves, or whatever else you can think of to best illustrate your results. The discussion should explain what the results mean, and how to interpret them. You should also explain the limitations of your approach/model and how sensitive your results are to the assumptions you made.

Use plots (see `PyPlot` examples from class), or you can display results in a table like this:

| Tables        | Are           | Cool  |
| ------------- |:-------------:| -----:|
| col 3 is      | right-aligned |\$1600 |
| col 2 is      | centered      |  \$12 |
| zebra stripes | are neat      |   \$1 |


### You can add change the format of sections 2, 3 and 4 to accomodate multiple versions of your model, sensitivity analysis, etc. Just make sure that you answer all the questions we have stated in the final version of your report.

### 4.A. You can also add subsections

#### 4.A.a. or subsubsections
Having more structure in the report can help readers understand your analysis and results!

## 5. Conclusion ##

Summarize your findings and your results, and talk about at least one possible future direction; something that might be interesting to pursue as a follow-up to your project.

## 6. Author Contributions

Note: The contributions in each category must sum to 100%. See Canvas for more details on what type of work belongs in each category.

#### 1. Modelling  
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  

  
#### 2. Analysis  
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  


#### 3. Data Gathering  
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  


#### 4. Software Implementation  
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  


#### 5. Report Writing    
Student A: Percentage contribution %  
Student B: Percentage contribution %  
Student C: Percentage contribution %  