# Introduction to Data Science – Project Proposals
*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*

Due: Friday, March 7 2025, 11:59pm.



# Project Title: Microeconomic Analysis of Wheat Pricing and Market Behavior in Utah: Exploring Local Agricultural Trends

## Team Members:
NAME(EMAIL, UID)
- **YUYANG YAO** (<u1400651@umail.utah.edu>, u1400651)
- **NICLAS SCOTT HOLMAN** (<u1281501@umail.utah.edu>, u1281501)
- **WEITONG NIU** (<u1345736@umail.utah.edu>, u1345736)

## Background and Motivation

The motivation for this project arises from the ongoing importance of wheat as a staple agricultural commodity, particularly in local markets such as Utah. Wheat pricing is influenced by numerous factors including production levels, supply chain issues, export-import dynamics, and local consumption patterns. As wheat plays a crucial role in local economies, understanding the microeconomic factors behind its pricing and market behavior is vital for informed decision-making by local farmers, policymakers, and businesses.

We are particularly interested in understanding how local factors, such as environmental conditions, production efficiency, and trade policies, affect wheat prices in Utah. In recent years, fluctuations in agricultural commodity prices have led to significant economic consequences for both producers and consumers in the region. By exploring these local dynamics, we aim to provide valuable insights that could help mitigate risks associated with price volatility and optimize agricultural practices in the region.

Through this analysis, we hope to contribute to a broader understanding of the agricultural markets in Utah, which can potentially be applied to other regions with similar economic and agricultural structures. Furthermore, this study aligns with current trends in using microeconomic theory to analyze agricultural markets and predict future market behaviors, particularly in the context of policy changes or shifts in production patterns.

## Project Objectives

- **Primary Questions**:
  - How do local factors, such as production levels, imports, exports, and stocks, influence wheat prices in Utah?
  - What are the key determinants of wheat supply and demand in Utah's wheat market?
  - Can we identify significant trends in wheat pricing and market behavior based on historical data?

- **Learning Objectives**:
  - Develop expertise in applying microeconomic concepts, such as supply and demand models, market equilibrium, and price elasticity, to an agricultural market context.
  - Gain experience in working with time series data, including understanding seasonal trends and their impact on pricing.
  - Understand the role of local markets within broader national and global agricultural trade systems.
  - Learn how to conduct data cleaning and preparation, exploratory analysis, and develop forecasting models using historical agricultural data.


## Data Description and Acquisition

- **Data Format**: The data will be in CSV format, which includes historical quarterly data on wheat production, exports, imports, consumption, and stocks by class.
- **Size**: The dataset consists of several thousand records spanning multiple decades, from the 1973/74 marketing year to the present. Each row represents a quarterly observation of wheat balance sheet data for a specific class of wheat.
- **Attributes**: The dataset includes the following key features:
  - MarketingYear (Year of observation)
  - TimePeriod (Quarterly time period)
  - Commodity (Wheat)
  - Class (Wheat class, e.g., Hard Red Winter, Soft Red Winter, etc.)
  - Attribute (Data type such as Production, Exports, Consumption, etc.)
  - GeographicalLevel (e.g., national, regional)
  - Location (Country, e.g., United States)
  - Value (The numeric value of the specific attribute)
  - Unit (Measurement unit, e.g., bushels, tons)
  
- **Source**: The data is publicly available from the U.S. Department of Agriculture (USDA), Economic Research Service, and USDA World Agricultural Outlook Board. We will download the data from the USDA's historical by-class wheat data repository: [USDA Wheat Data](https://www.ers.usda.gov/data-products/wheat-data/).
  
- **Structure**: The dataset is structured in a tabular format, where each row corresponds to one time period for a specific wheat class and its associated data attribute.


In [4]:
# Example: Loading and inspecting the dataset
import pandas as pd

# Load the dataset
data = pd.read_csv('path_to_your_wheat_data.csv')

# Display the first few rows to understand its structure
data.head()

FileNotFoundError: [Errno 2] No such file or directory: 'path_to_your_wheat_data.csv'

## Ethical Considerations

- **Stakeholders**:
  - Local farmers and wheat producers in Utah could benefit from better understanding of price fluctuations and market trends.
  - Policymakers and agricultural economists in the region may use the findings to optimize local agricultural policies and programs.
  - Consumers in Utah and the broader region who are affected by changes in wheat prices might be indirectly impacted by more stable pricing or improved market efficiencies.
  
- **Potential Harm**:
  While this analysis focuses on publicly available historical data and does not include any personal or sensitive information, there is still the potential for misinterpretation of the results. For instance, if the findings were used to make broad assumptions about the future without considering changing market dynamics, it could mislead stakeholders. However, since the data is aggregated and does not contain any identifiable personal information, ethical risks are minimal.


## Data Cleaning and Processing

We anticipate several data cleaning tasks, including:
- Handling missing values: Some rows may have missing or incomplete data, especially in older records. We plan to impute missing values using mean imputation or drop rows with excessive missing data.
- Ensuring consistent formatting: The dataset may have inconsistent units (e.g., tons vs. bushels), and we will standardize units across all observations.
- Aggregating data: We may need to aggregate quarterly data to yearly summaries to simplify analysis and remove noise from seasonal fluctuations.

Additionally, we plan to create new features such as:
- Adjusted production values (taking into account inflation or market conditions).
- Price indices based on supply-demand relationships in the dataset.

In [None]:
# Example: Handling missing values and aggregating data
# Fill missing values with the median of the column (as an example)
data.fillna(data.median(), inplace=True)

# Aggregate by marketing year
data['MarketingYear'] = data['MarketingYear'].astype(str)
yearly_data = data.groupby(['MarketingYear', 'Class']).sum()
yearly_data.head()

## Exploratory Analysis

We will begin our exploratory data analysis (EDA) by examining the trends in wheat production, exports, consumption, and prices over time. Specifically, we will:
- Create time series plots to visualize how each variable changes across quarters and years.
- Generate correlation matrices to explore the relationships between wheat supply, demand, and price.
- Use histograms to analyze the distribution of wheat pricing data.

We will also look at trends by class to understand how different wheat types behave in the market.

In [None]:
# Example: Plotting wheat production over time
import matplotlib.pyplot as plt
import seaborn as sns

# Plotting the production of wheat over the years
plt.figure(figsize=(10, 6))
sns.lineplot(x='MarketingYear', y='Production', data=data, hue='Class')
plt.title('Wheat Production by Class Over Time')
plt.xlabel('Year')
plt.ylabel('Production (in tons)')
plt.xticks(rotation=45)
plt.show()

## Analysis Methodology

To analyze the wheat market dynamics, we will employ the following methods:
- **Time Series Forecasting**: Using ARIMA and possibly LSTM models to forecast wheat production and pricing trends.
- **Supply and Demand Analysis**: Implementing basic supply-demand models to understand how fluctuations in supply (production) and demand (consumption, exports) influence prices.
- **Regression Analysis**: Using regression models to quantify the impact of various factors, such as production levels, exports, and imports, on wheat prices in Utah.

We will focus on modeling price formation and predicting future price trends based on historical data.



In [None]:
# Example: Simple linear regression to predict wheat price based on production
from sklearn.linear_model import LinearRegression

# Selecting relevant features for regression
X = data[['Production', 'Exports', 'Imports']]
y = data['Price']

# Fitting a linear regression model
model = LinearRegression()
model.fit(X, y)

# Display the coefficients of the model
model.coef_

## Project Schedule

- **Week 1**: Data acquisition and cleaning (Assigned to: [Your Name])
- **Week 2**: Exploratory data analysis and initial visualizations (Assigned to: [Team Member's Name])
- **Week 3**: Analysis and model development (Assigned to: [Your Name])
- **Week 4**: Final analysis, report writing, and presentation preparation (Assigned to: [Both])