# Data Cleaning for Stocks, BYND, TSN, JBS USA

### Tyson Foods (TSN), and JBS USA (JBSAY) are the top food processors in the USA. These two companies were chosen as the market leaders in food processing to look at trends in the company stock. Since there is trouble with actual meat Beyond Foods (BYND) is another company that specializes in plant-based food and is used to compare with the performance of Tyson and JBS

In [1]:
import os
import pandas as pd
from datetime import datetime, timedelta
import yfinance as yf

## Load BYND and TSN from Yahoo Finance, 1 year's worth of data

In [2]:
BYND = yf.Ticker("BYND")
TSN = yf.Ticker("TSN")
BYND_stock = BYND.history("12mo")
TSN_stock = TSN.history("12mo")

BYND_close = BYND_stock.drop(columns = ["Open","High","Low","Volume","Dividends","Stock Splits"])
BYND_close.columns = ["BYND"]
del BYND_close.index.name

TSN_close = TSN_stock.drop(columns = ["Open","High","Low","Volume","Dividends","Stock Splits"])
TSN_close.columns = ["TSN"]
del TSN_close.index.name

In [3]:
BYND_close

Unnamed: 0,BYND
2019-05-16,92.92
2019-05-17,89.35
2019-05-20,86.09
2019-05-21,77.50
2019-05-22,77.63
2019-05-23,82.10
2019-05-24,79.67
2019-05-28,86.00
2019-05-29,97.50
2019-05-30,98.59


## Import JBS data from Yahoo Finance API

In [4]:
jbsay = yf.Ticker("JBSAY")
jbsay_stock = jbsay.history("12mo")
jbsay_stock.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2019-05-16,10.82,11.65,10.69,11.05,74000,0.0,0
2019-05-17,11.27,11.39,11.08,11.3,65700,0.0,0
2019-05-20,11.53,11.84,11.4,11.59,202700,0.0,0
2019-05-21,11.37,11.5,10.76,10.8,225800,0.0,0
2019-05-22,11.14,11.22,10.96,10.98,79800,0.0,0


In [5]:
jbsay_close = jbsay_stock.drop(columns = ["Open","High","Low","Volume","Dividends","Stock Splits"])
jbsay_close.columns = ["JBSAY"]
del jbsay_close.index.name

In [6]:
jbsay_close.head()

Unnamed: 0,JBSAY
2019-05-16,11.05
2019-05-17,11.3
2019-05-20,11.59
2019-05-21,10.8
2019-05-22,10.98


## Concatenate the 2 Datasets

In [7]:
total_df = pd.concat([BYND_close,TSN_close,jbsay_close], axis="columns",join = "inner")
total_df.head()

Unnamed: 0,BYND,TSN,JBSAY
2019-05-16,92.92,80.65,11.05
2019-05-17,89.35,80.54,11.3
2019-05-20,86.09,79.93,11.59
2019-05-21,77.5,78.7,10.8
2019-05-22,77.63,79.34,10.98
