# COGS 188 - Final Project

# Optimizing Pokémon Card Market Value: A Markov Decision Process and Q-Learning Approach

## Group members

- Ananya Krishnan
- Ava Jeong
- Charlene Hsu
- JohnWesley Pabalate

# Abstract 
This section should be short and clearly stated. It should be a single paragraph <200 words.  It should summarize: 
- what your goal/problem is
- what the data used represents 
- the solution/what you did
- major results you came up with (mention how results are measured) 

__NB:__ this final project form is much more report-like than the proposal and the checkpoint. Think in terms of writing a paper with bits of code in the middle to make the plots/tables

Due to factors such as rarity, demand, and general collector interest, Pokemon cards have significant fluctuations in value in the trading card market. The goal of this project is to develop a Markov Decision Process and Q learning model to predict and optimize the price of Pokemon cards with structured data sources from Kaggle. Our chosen datasets include important attributes such as card type, rarity, set generation, Pokemon abilities and strengths, and historical market prices. These features will help our model with analyzing trends and predicting the market value of these cards. 

Markov Decision Process provides a mathematical framework for modeling decision-making in dynamic environments, which are well suited for stochastic scenarios such as price fluctuations. In addition to MDPs, we will also leverage a reinforcement learning approach using the Q-learning algorithm to iteratively refine pricing strategies based on the datasets. The model will explore different pricing strategies to maximize long-term profitability.

The model will be trained on historical card price trends (1999-2003) and tested against real data (2024) to test the performance of the model. Evaluation metrics will include profitability, policy effectiveness, and inventory effectiveness. The results of this project aim to provide users with a reliable tool for Pokemon card pricing and trend analysis backed by data. 


# Background

Fill in the background and discuss the kind of prior work that has gone on in this research area here. **Use inline citation** to specify which references support which statements.  You can do that through HTML footnotes (demonstrated here). I used to reccommend Markdown footnotes (google is your friend) because they are simpler but recently I have had some problems with them working for me whereas HTML ones always work so far. So use the method that works for you, but do use inline citations.

Here is an example of inline citation. After government genocide in the 20th century, real birds were replaced with surveillance drones designed to look just like birds<a name="lorenz"></a>[<sup>[1]</sup>](#lorenznote). Use a minimum of 3 to 5 citations, but we prefer more <a name="admonish"></a>[<sup>[2]</sup>](#admonishnote). You need enough citations to fully explain and back up important facts. 

Remeber you are trying to explain why someone would want to answer your question or why your hypothesis is in the form that you've stated.

The Pokémon trading card market has become increasingly valuable, with rare cards sometimes selling for thousands of dollars. Factors such as rarity, collector demand, and external market conditions influence these price fluctuations<a name="stiller"></a>[<sup>[1]</sup>](#collectiblesnote). Some individuals have even turned Pokémon card trading into a full-time business. For example, former NFL player Blake Martinez retired from football to focus on reselling Pokémon cards, generating millions in revenue<a name="martinez"></a>[<sup>[2]</sup>](#martinezsnote).

One of the biggest challenges in this market is pricing strategy. Traditional methods, like fixed markup pricing, fail to capture the dynamic nature of supply and demand. Online marketplaces such as TCGPlayer and eBay operate as multisided platforms, where prices shift based on buyer interest, scarcity, and competitive listings. Research on platform pricing strategies shows that businesses optimize their pricing by adjusting costs based on user behavior, similar to how Pokémon card prices rise when collector demand increases<a name="platforms"></a>[<sup>[3]</sup>](#platformsnote).

Another key factor affecting Pokémon card prices is network effects. A card’s value can skyrocket if a popular YouTuber or competitive player features it in a video. This aligns with research on indirect network effects, where an increase in engagement from one group of users (buyers) raises value for another group (sellers and trading platforms)<a name="network"></a>[<sup>[4]</sup>](#networknote). Traditional pricing models struggle to react quickly to these sudden shifts, leading to inconsistent valuations.

To handle these challenges, researchers have applied AI-based dynamic pricing models in industries like e-commerce, stock trading, and airline ticketing. Studies have found that AI-driven models outperform static pricing strategies in these fields because they can analyze real-time demand and adjust prices accordingly<a name="ai-pricing"></a>[<sup>[5]</sup>](#ai-pricingnote).

A Markov Decision Process (MDP) is a useful tool for modeling price changes over time. It allows AI models to learn from past pricing decisions and optimize future choices, even in uncertain market conditions<a name="markov"></a>[<sup>[6]</sup>](#markovnote).

One effective reinforcement learning method is Q-learning, which helps adjust prices dynamically by learning from past trends and optimizing decisions for profitability<a name="qlearning"></a>[<sup>[7]</sup>](#qlearningnote).Researchers have successfully used Q-learning in online retail and auction pricing, showing that it can set real-time prices more effectively than traditional methods<a name="ai-commerce"></a>[<sup>[8]</sup>](#ai-commercenote).

Very little research has used AI for pricing collectibles like Pokémon cards. Our goal is to apply reinforcement learning to make pricing more dynamic. By using past sales data and real-time trends, our model will set more accurate prices, helping sellers maximize profits and respond to market changes faster.

# Problem Statement

Clearly describe the problem that you are solving. Avoid ambiguous words. The problem described should be well defined and should have at least one ML-relevant potential solution. Additionally, describe the problem thoroughly such that it is clear that the problem is quantifiable (the problem can be expressed in mathematical or logical terms), measurable (the problem can be measured by some metric and clearly observed), and replicable (the problem can be reproduced and occurs more than once).

We want to build a dynamic pricing model that optimizes Pokemon trading card prices using reinforcement learning. Our goal is to maximize long term profits by balancing demand trends, market value and card rarity. Instead of fixed markups or rule based strategies, we want our model to use real-time market data in developing an optimal pricing strategy. 
The problem is: 
Quantifiable since the pricing decision can be expressed as a Markov Decision Process (MDP) with state variables (current price, demand, inventory), actions (price adjustments), and a reward function (profit over time). 
Measurable: We will use cumulative profit, price elasticity of demand, and inventory turnover rate as key metrics to evaluate the model’s effectiveness.
Replicable: The model will be trained on historical card price trends (1999–2023) and tested against real 2024 data and simulated market environments to assess generalizability.

# Data

Detail how/where you obtained the data and cleaned it (if necessary)

If the data cleaning process is very long (e.g., elaborate text processing) consider describing it briefly here in text, and moving the actual clearning process to another notebook in your repo (include a link here!).  The idea behind this approach: this is a report, and if you blow up the flow of the report to include a lot of code it makes it hard to read.

Please give the following infomration for each dataset you are using
- link/reference to obtain it
- description of the size of the dataset (# of variables, # of observations)
- what an observation consists of
- what some critical variables are, how they are represented
- any special handling, transformations, cleaning, etc you have done should be demonstrated here!

We are pulling most of our datasets from Kaggle, where we are specifically looking for datasets that include various specific attributes of a card, for example, type of card (whether it is holo or any unique designs), the rarity of the cards in the set, the generation of the card which may influence the rarity of it, the abilities and strengths of the Pokemon itself, and the prices that the cards can be resold for or the market prices that they are at.

https://www.kaggle.com/datasets/adampq/pokemon-tcg-all-cards-1999-2023
- This dataset provides data for all Pokemon trading cards from 1999 - 2023, allowing for analysis of cards spanning multiple series and generations. It also provides data on detailed attributes, abilities, attacks, rarity, legalities, and other relevant information, going into detail the characteristics of each card.
- This dataset consists of 29 columns and was last updated a year ago, consisting of pretty recent data on Pokemons.
  
https://www.kaggle.com/datasets/shivd24coder/pokemon-card-collection-dataset
- This dataset provides similar data to the one above, however, there is additional information in this dataset that would provide useful information in analyzing how pricing of cards can be determined, where attributes like the artist of the card, the image of the card, and pricings that can be pulled from URLs given that directly link to the official card on TCGPlayer.com.
- This dataset consists of 5 columns, two of which are URLs that link to the TCGPlayer website, which was last updated a year ago, showing more recent data on Pokemons as well.

https://www.kaggle.com/datasets/jacklacey/pokemon-trading-cards
- This dataset was specifically chosen based on the attribute that indicates specific prices of each card, however, the prices listed in this dataset are limited in which the prices are fixed values and don’t show market fluctuations, which doesn’t allow flexibility in using this dataset.
- This dataset consists of 5 columns, which this dataset was last updated 3 years ago, where some of the data may be slightly outdated.

https://github.com/wjsutton/pokemon_tcg_stockmarket
- This dataset allows for a deeper look into the Pokemon Trading Card stock market, allowing us to see the daily pricing of Pokemon cards, which allows for us to be able to evaluate trends in fluctuations of prices for the cards. This dataset is directly sourced from the official TCGPlayer website with variables that focus on the card identification, rarity, and market price. This dataset was last updated 4 years ago, which may bring it more outdated information, which we can explore and see how we can use it as a resource for finding more recent data on the market of the Pokemon cards.

Much of this data will need to be cleaned and formatted with the same guidelines so that they can be easily used. Some datasets may even be merged or combined to allow for more concise usage of data. 

# Proposed Solution

In this section, clearly describe a solution to the problem. The solution should be applicable to the project domain and appropriate for the dataset(s) or input(s) given. Provide enough detail (e.g., algorithmic description and/or theoretical properties) to convince us that your solution is applicable. Make sure to describe how the solution will be tested.  

If you know details already, describe how (e.g., library used, function calls) you plan to implement the solution in a way that is reproducible.

If it is appropriate to the problem statement, describe a benchmark model<a name="sota"></a>[<sup>[3]</sup>](#sotanote) against which your solution will be compared. 

We will use reinforcement learning to create an agent that adjusts card prices dynamically to maximize long-term revenue. The model will be trained using Q-learning or Deep Q Networks (DQN). Instead of just forecasting prices, we want the model to actively make pricing decisions based on past and real-time data. We will train the model on long-term historical data to capture price fluctuations, and test it on recent months (e.g., 2024 Q1) to evaluate real-world effectiveness. We will compare it with a basic rule-based fixed pricing model (e.g., cost based pricing that involves adding a markup to the cost of production to determine a selling price). This approach is effective because reinforcement learning optimizes decision-making in dynamic environments, allowing the model to adapt to shifting market conditions and maximize profitability over time.

# Evaluation Metrics

Propose at least one evaluation metric that can be used to quantify the performance of both the benchmark model and the solution model. The evaluation metric(s) you propose should be appropriate given the context of the data, the problem statement, and the intended solution. Describe how the evaluation metric(s) are derived and provide an example of their mathematical representations (if applicable). Complex evaluation metrics should be clearly defined and quantifiable (can be expressed in mathematical or logical terms).

We will evaluate the profitability by comparing cumulative profit of the reinforcement learning-based strategy vs. the benchmark model.

# Results

You may have done tons of work on this. Not all of it belongs here. 

Reports should have a __narrative__. Once you've looked through all your results over the quarter, decide on one main point and 2-4 secondary points you want us to understand. Include the detailed code and analysis results of those points only; you should spend more time/code/plots on your main point than the others.

If you went down any blind alleys that you later decided to not pursue, please don't abuse the TAs time by throwing in 81 lines of code and 4 plots related to something you actually abandoned.  Consider deleting things that are not important to your narrative.  If its slightly relevant to the narrative or you just want us to know you tried something, you could keep it in by summarizing the result in this report in a sentence or two, moving the actual analysis to another file in your repo, and providing us a link to that file.

### Subsection 1

You will likely have different subsections as you go through your report. For instance you might start with an analysis of the dataset/problem and from there you might be able to draw out the kinds of algorithms that are / aren't appropriate to tackle the solution.  Or something else completely if this isn't the way your project works.

### Subsection 2

Another likely section is if you are doing any feature selection through cross-validation or hand-design/validation of features/transformations of the data

### Subsection 3

Probably you need to describe the base model and demonstrate its performance.  Probably you should include a learning curve to demonstrate how much better the model gets as you increase the number of trials

### Subsection 4

Perhaps some exploration of the model selection (hyper-parameters) or algorithm selection task. Generally reinforement learning tasks may require a huge amount of training, so extensive grid search is unlikely to be possible. However expoloring a few reasonable hyper-parameters may still be possible.  Validation curves, plots showing the variability of perfromance across folds of the cross-validation, etc. If you're doing one, the outcome of the null hypothesis test or parsimony principle check to show how you are selecting the best model.

### Subsection 5 

Maybe you do model selection again, but using a different kind of metric than before?  Or you compare a completely different approach/alogirhtm to the problem? Whatever, this stuff is just serving suggestions.



# Discussion

### Interpreting the result

OK, you've given us quite a bit of tech informaiton above, now its time to tell us what to pay attention to in all that.  Think clearly about your results, decide on one main point and 2-4 secondary points you want us to understand. Highlight HOW your results support those points.  You probably want 2-5 sentences per point.


### Limitations

Are there any problems with the work?  For instance would more data change the nature of the problem? Would it be good to explore more hyperparams than you had time for?   


### Future work
Looking at the limitations and/or the toughest parts of the problem and/or the situations where the algorithm(s) did the worst... is there something you'd like to try to make these better.

### Ethics & Privacy

If your project has obvious potential concerns with ethics or data privacy discuss that here.  Almost every ML project put into production can have ethical implications if you use your imagination. Use your imagination.

Even if you can't come up with an obvious ethical concern that should be addressed, you should know that a large number of ML projects that go into producation have unintended consequences and ethical problems once in production. How will your team address these issues?

Consider a tool to help you address the potential issues such as https://deon.drivendata.org

### Conclusion

Reiterate your main point and in just a few sentences tell us how your results support it. Mention how this work would fit in the background/context of other work in this field if you can. Suggest directions for future work if you want to.

# Footnotes
<a name="lorenznote"></a>1.[^](#lorenz): Lorenz, T. (9 Dec 2021) Birds Aren’t Real, or Are They? Inside a Gen Z Conspiracy Theory. *The New York Times*. https://www.nytimes.com/2021/12/09/technology/birds-arent-real-gen-z-misinformation.html<br> 
<a name="admonishnote"></a>2.[^](#admonish): Also refs should be important to the background, not some randomly chosen vaguely related stuff. Include a web link if possible in refs as above.<br>
<a name="sotanote"></a>3.[^](#sota): Perhaps the current state of the art solution such as you see on [Papers with code](https://paperswithcode.com/sota). Or maybe not SOTA, but rather a standard textbook/Kaggle solution to this kind of problem


## OURS
<a name="shillernote"></a>[<sup>[1]</sup>](#shiller) Shiller, Robert J. *Narrative Economics: How Stories Go Viral and Drive Economic Events.* Princeton University Press, 2019. [https://press.princeton.edu/books/hardcover/9780691182292/narrative-economics](https://press.princeton.edu/books/hardcover/9780691182292/narrative-economics).

<a name="martineznote"></a>[<sup>[2]</sup>](#martinez) Martinez, Blake. *"How I Made Millions Selling Pokémon Cards After Leaving the NFL."* CNBC, 26 Apr. 2023. [https://www.cnbc.com/2023/04/26/blake-martinez-pokemon-card-side-hustle-company-brings-in-millions.html](https://www.cnbc.com/2023/04/26/blake-martinez-pokemon-card-side-hustle-company-brings-in-millions.html). Accessed 14 Feb. 2025.

<a name="platformsnote"></a>[<sup>[3]</sup>](#platforms) Rochet, Jean-Charles, and Jean Tirole. *"Platform Competition in Two-Sided Markets."* *Journal of the European Economic Association*, vol. 1, no. 4, 2003, pp. 990-1029. [https://academic.oup.com/jeea/article/1/4/990/2280902](https://academic.oup.com/jeea/article/1/4/990/2280902).

<a name="networknote"></a>[<sup>[4]</sup>](#network) Evans, David S., and Richard Schmalensee. *"The Economics of Two-Sided Markets."* *Review of Network Economics*, vol. 6, no. 2, 2007, pp. 1-26.

<a name="ai-pricingnote"></a>[<sup>[5]</sup>](#ai-pricing) Bertsimas, Dimitris, and Nathan Kallus. *"From Predictive to Prescriptive Analytics."* *Management Science*, vol. 65, no. 3, 2019, pp. 1027-1049. [https://pubsonline.informs.org/doi/10.1287/mnsc.2018.3253](https://pubsonline.informs.org/doi/10.1287/mnsc.2018.3253).

<a name="markovnote"></a>[<sup>[6]</sup>](#markov) Puterman, Martin L. *Markov Decision Processes: Discrete Stochastic Dynamic Programming.* John Wiley & Sons, 1994. [https://onlinelibrary.wiley.com/doi/chapter-epub/10.1002/9780470316887.fmatter](https://onlinelibrary.wiley.com/doi/chapter-epub/10.1002/9780470316887.fmatter).

<a name="qlearningnote"></a>[<sup>[7]</sup>](#qlearning) Watkins, Christopher J. C. H., and Peter Dayan. *"Q-learning: Model-Free Reinforcement Learning."* *Machine Learning Journal*, vol. 8, no. 3-4, 1992, pp. 279-292. [https://link.springer.com/article/10.1007/BF00992698](https://link.springer.com/article/10.1007/BF00992698).

<a name="ai-commercenote"></a>[<sup>[8]</sup>](#ai-commerce) Tesauro, Gerald, and Jeffrey O. Kephart. *"Pricing Strategies Using Q-Learning in E-Commerce."* *AAAI Conference on Artificial Intelligence*, 2002. [https://www.researchgate.net/publication/2820310_Pricing_in_Agent_Economies_Using_Multi-Agent_Q-Learning](https://www.researchgate.net/publication/2820310_Pricing_in_Agent_Economies_Using_Multi-Agent_Q-Learning).

