# Assignment 1: Stata Basics with Auto Dataset

**Total Points: 10**

This assignment will test your understanding of basic Stata commands using the built-in auto dataset.

## Setup (Read-only)

Run this cell to load the data. Do not modify it.

In [9]:
* Load the auto dataset
sysuse auto, clear
* Ensure clean state
drop if missing(price) | missing(mpg)

(1978 automobile data)
(0 observations deleted)


## Question 1: Create a New Variable (2 points)

Create a new variable called `expensive` that equals 1 if the car price is greater than 6000, and 0 otherwise.

In [10]:
* Create the expensive variable
* YOUR CODE HERE
gen expensive = price > 6000

In [11]:
* TEST CELL - DO NOT MODIFY
* Check if expensive variable exists
capture confirm variable expensive
assert _rc == 0

* Check the values are correct
quietly count if price > 6000 & expensive != 1
assert r(N) == 0
quietly count if price <= 6000 & expensive != 0
assert r(N) == 0

* Check variable type
quietly summarize expensive
assert r(min) >= 0 & r(max) <= 1

display "✓ Question 1 passed all tests!"

✓ Question 1 passed all tests!


## Question 2: Calculate Summary Statistics (2 points)

Calculate the mean price for foreign and domestic cars separately. Store the mean price of foreign cars in a scalar called `foreign_mean` and the mean price of domestic cars in a scalar called `domestic_mean`.

In [12]:
* Calculate mean prices by foreign status
* YOUR CODE HERE
quietly summarize price if foreign == 1
scalar foreign_mean = r(mean)

quietly summarize price if foreign == 0
scalar domestic_mean = r(mean)

In [13]:
* TEST CELL - DO NOT MODIFY
* Check if scalars exist
capture scalar list foreign_mean
assert _rc == 0
capture scalar list domestic_mean
assert _rc == 0

* Check the values are correct (with small tolerance for rounding)
quietly summarize price if foreign == 1
assert abs(foreign_mean - r(mean)) < 0.01

quietly summarize price if foreign == 0
assert abs(domestic_mean - r(mean)) < 0.01

display "✓ Question 2 passed all tests!"
display "Foreign mean: " foreign_mean
display "Domestic mean: " domestic_mean

✓ Question 2 passed all tests!
Foreign mean: 6384.6818
Domestic mean: 6072.4231


## Question 3: Run a Regression (3 points)

Run a regression of price on mpg, weight, and foreign. Store the R-squared value in a scalar called `rsq`.

In [None]:
* Run the regression and store R-squared
* YOUR CODE HERE
regress price mpg weight foreign
scalar rsq = e(r2)

In [None]:
* TEST CELL - DO NOT MODIFY
* Check if scalar exists
capture scalar list rsq
assert _rc == 0

* Run the correct regression to check
quietly regress price mpg weight foreign
assert abs(rsq - e(r2)) < 0.001

* Check that R-squared is in valid range
assert rsq >= 0 & rsq <= 1

* Check that the regression was actually run (coefficients exist)
assert e(N) > 0
assert e(df_m) == 3

display "✓ Question 3 passed all tests!"
display "R-squared: " rsq

## Question 4: Data Manipulation (3 points)

1. Create a categorical variable `mpg_cat` with three categories:
   - 1 = "Low" (mpg < 20)
   - 2 = "Medium" (20 <= mpg < 25)
   - 3 = "High" (mpg >= 25)
2. Count how many cars are in each category and store the counts in scalars `n_low`, `n_medium`, and `n_high`.

In [None]:
* Create categorical variable and count observations
* YOUR CODE HERE
generate mpg_cat = .
replace mpg_cat = 1 if mpg < 20
replace mpg_cat = 2 if mpg >= 20 & mpg < 25
replace mpg_cat = 3 if mpg >= 25

quietly count if mpg_cat == 1
scalar n_low = r(N)

quietly count if mpg_cat == 2
scalar n_medium = r(N)

quietly count if mpg_cat == 3
scalar n_high = r(N)

In [None]:
* TEST CELL - DO NOT MODIFY
* Check variable exists
capture confirm variable mpg_cat
assert _rc == 0

* Check categorization is correct
quietly count if mpg < 20 & mpg_cat != 1
assert r(N) == 0
quietly count if mpg >= 20 & mpg < 25 & mpg_cat != 2
assert r(N) == 0
quietly count if mpg >= 25 & mpg_cat != 3
assert r(N) == 0

* Check scalars exist and are correct
capture scalar list n_low n_medium n_high
assert _rc == 0

quietly count if mpg_cat == 1
assert n_low == r(N)
quietly count if mpg_cat == 2
assert n_medium == r(N)
quietly count if mpg_cat == 3
assert n_high == r(N)

* Check total adds up
assert n_low + n_medium + n_high == _N

display "✓ Question 4 passed all tests!"
display "Low MPG cars: " n_low
display "Medium MPG cars: " n_medium
display "High MPG cars: " n_high

## Submission

Great job! You've completed the assignment. Make sure all cells have been run and your outputs are visible.