# Baseball Database Exploration using the Lahman Database

source - http://www.seanlahman.com/baseball-archive/statistics/

The Lahman Database contains historical baseball data results by season going back to the 19th century. We will use this database to explore some more advanced SQL techniques.

In [1]:
# import
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# sql connect
%load_ext sql

%sql postgresql://postgres:yellowpencil@35.196.107.77/postgres

'Connected: postgres@postgres'

## Let's see what tables are available to us in the database

In [3]:
%%sql
SELECT table_name FROM information_schema.tables
WHERE "table_type" = 'BASE TABLE' AND "table_schema" = 'public'

11 rows affected.


table_name
dim_batting
dim_college
dim_hof
dim_pitching
dim_salaries
default_data
sf_crime
dim_teams
fact_players
titanic


## fact_players looks like our transactional table for this exercise - the database is built around the player as a key.

In [4]:
%%sql
select * from fact_players
limit 10

10 rows affected.


index,playerID,birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity,deathYear,deathMonth,deathDay,deathCountry,deathState,deathCity,nameFirst,nameLast,nameGiven,weight,height,bats,throws,debut,finalGame,retroID,bbrefID
0,aardsda01,1981.0,12.0,27.0,USA,CO,Denver,,,,,,,David,Aardsma,David Allan,215.0,75.0,R,R,2004-04-06 00:00:00,2015-08-23 00:00:00,aardd001,aardsda01
1,aaronha01,1934.0,2.0,5.0,USA,AL,Mobile,,,,,,,Hank,Aaron,Henry Louis,180.0,72.0,R,R,1954-04-13 00:00:00,1976-10-03 00:00:00,aaroh101,aaronha01
2,aaronto01,1939.0,8.0,5.0,USA,AL,Mobile,1984.0,8.0,16.0,USA,GA,Atlanta,Tommie,Aaron,Tommie Lee,190.0,75.0,R,R,1962-04-10 00:00:00,1971-09-26 00:00:00,aarot101,aaronto01
3,aasedo01,1954.0,9.0,8.0,USA,CA,Orange,,,,,,,Don,Aase,Donald William,190.0,75.0,R,R,1977-07-26 00:00:00,1990-10-03 00:00:00,aased001,aasedo01
4,abadan01,1972.0,8.0,25.0,USA,FL,Palm Beach,,,,,,,Andy,Abad,Fausto Andres,184.0,73.0,L,L,2001-09-10 00:00:00,2006-04-13 00:00:00,abada001,abadan01
5,abadfe01,1985.0,12.0,17.0,D.R.,La Romana,La Romana,,,,,,,Fernando,Abad,Fernando Antonio,220.0,73.0,L,L,2010-07-28 00:00:00,2016-09-25 00:00:00,abadf001,abadfe01
6,abadijo01,1850.0,11.0,4.0,USA,PA,Philadelphia,1905.0,5.0,17.0,USA,NJ,Pemberton,John,Abadie,John W.,192.0,72.0,R,R,1875-04-26 00:00:00,1875-06-10 00:00:00,abadj101,abadijo01
7,abbated01,1877.0,4.0,15.0,USA,PA,Latrobe,1957.0,1.0,6.0,USA,FL,Fort Lauderdale,Ed,Abbaticchio,Edward James,170.0,71.0,R,R,1897-09-04 00:00:00,1910-09-15 00:00:00,abbae101,abbated01
8,abbeybe01,1869.0,11.0,11.0,USA,VT,Essex,1962.0,6.0,11.0,USA,VT,Colchester,Bert,Abbey,Bert Wood,175.0,71.0,R,R,1892-06-14 00:00:00,1896-09-23 00:00:00,abbeb101,abbeybe01
9,abbeych01,1866.0,10.0,14.0,USA,NE,Falls City,1926.0,4.0,27.0,USA,CA,San Francisco,Charlie,Abbey,Charles S.,169.0,68.0,L,L,1893-08-16 00:00:00,1897-08-19 00:00:00,abbec101,abbeych01


## Look at dim_batting to see what kind of statistics we can pull in for a player

In [5]:
%%sql
select * from dim_batting
limit 10

10 rows affected.


index,playerID,yearID,stint,teamID,lgID,G,AB,R,H,2B,3B,HR,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP
0,abercda01,1871,1,TRO,,1,4,0,0,0,0,0,0.0,0.0,0.0,0,0.0,,,,,
1,addybo01,1871,1,RC1,,25,118,30,32,6,0,0,13.0,8.0,1.0,4,0.0,,,,,
2,allisar01,1871,1,CL1,,29,137,28,40,4,5,0,19.0,3.0,1.0,2,5.0,,,,,
3,allisdo01,1871,1,WS3,,27,133,28,44,10,2,2,27.0,1.0,1.0,0,2.0,,,,,
4,ansonca01,1871,1,RC1,,25,120,29,39,11,3,0,16.0,6.0,2.0,2,1.0,,,,,
5,armstbo01,1871,1,FW1,,12,49,9,11,2,1,0,5.0,0.0,1.0,0,1.0,,,,,
6,barkeal01,1871,1,RC1,,1,4,0,1,0,0,0,2.0,0.0,0.0,1,0.0,,,,,
7,barnero01,1871,1,BS1,,31,157,66,63,10,9,0,34.0,11.0,6.0,13,1.0,,,,,
8,barrebi01,1871,1,FW1,,1,5,1,1,1,0,0,1.0,0.0,0.0,0,0.0,,,,,
9,barrofr01,1871,1,BS1,,18,86,13,13,2,1,0,11.0,1.0,0.0,0,0.0,,,,,


# You are now working in the front office of a baseball team - you are tasked with a simple request for your first assignment, finding the players who have hit the most home runs in history so that you can bring them all in to play for your team
### (ignoring the fact that many players in this database are dead or long since retired, but we'll do what we're told)

* We have our fact table, fact_player, so we can get details about the player.
* We can see that dim_batting has a field "HR" for home runs.

The query below should get us what we want, correct?

In [6]:
%%sql
select p."nameFirst", p."nameLast", sum(b."HR") as home_runs
from fact_players p, dim_batting b 
where p."playerID" = b."playerID"
group by p."nameFirst", p."nameLast"
order by home_runs desc
limit 10
;

10 rows affected.


nameFirst,nameLast,home_runs
Frank,Thomas,807
Ken,Griffey,782
Barry,Bonds,762
Hank,Aaron,755
Babe,Ruth,714
Alex,Rodriguez,696
Willie,Mays,660
Jim,Thome,612
Sammy,Sosa,609
Albert,Pujols,591


### ... and your boss throws your report back in your face and yells at you for giving him crappy results. Barry Bonds is supposed to be at the top of this list with 762 home runs! What went wrong?

In [7]:
%%sql
select *
from fact_players p
where p."nameFirst" = 'Frank' and p."nameLast" = 'Thomas'

2 rows affected.


index,playerID,birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity,deathYear,deathMonth,deathDay,deathCountry,deathState,deathCity,nameFirst,nameLast,nameGiven,weight,height,bats,throws,debut,finalGame,retroID,bbrefID
17143,thomafr03,1929.0,6.0,11.0,USA,PA,Pittsburgh,,,,,,,Frank,Thomas,Frank Joseph,200.0,75.0,R,R,1951-08-17 00:00:00,1966-05-30 00:00:00,thomf103,thomafr03
17144,thomafr04,1968.0,5.0,27.0,USA,GA,Columbus,,,,,,,Frank,Thomas,Frank Edward,240.0,77.0,R,R,1990-08-02 00:00:00,2008-08-29 00:00:00,thomf001,thomafr04


## By excluding the playerID, our unique key for an individual person, we accidentally attributed all home runs for both Frank Thomases throughout history to one "Frank Thomas" who does not exist.

In [8]:
%%sql
select p."playerID", p."nameFirst", p."nameLast", sum(b."HR") as home_runs
from fact_players p, dim_batting b 
where p."playerID" = b."playerID"
group by p."playerID", p."nameFirst", p."nameLast"
order by home_runs desc
limit 10
;

10 rows affected.


playerID,nameFirst,nameLast,home_runs
bondsba01,Barry,Bonds,762
aaronha01,Hank,Aaron,755
ruthba01,Babe,Ruth,714
rodrial01,Alex,Rodriguez,696
mayswi01,Willie,Mays,660
griffke02,Ken,Griffey,630
thomeji01,Jim,Thome,612
sosasa01,Sammy,Sosa,609
pujolal01,Albert,Pujols,591
robinfr02,Frank,Robinson,586


## You are asked to identify if a player went to college or not. The team needs to play smarter, so we are identifying players who went to Harvard.

* We can see two tables that would help us, dim_schools and dim_college
* As we can see below, dim_schools provides us the information for a school, while dim_college can join directly back to our fact table on playerID to tell us what school a player went to, and which years.

In [9]:
%%sql
select * from dim_schools
where name_full like '%Harvard%'
limit 10;

1 rows affected.


index,schoolID,name_full,city,state,country
396,harvard,Harvard University,Cambridge,MA,USA


In [10]:
%%sql
select * from dim_college
where "schoolID" = 'harvard'
limit 10;

10 rows affected.


index,playerID,schoolID,yearID
2854,clarkwa01,harvard,1901
2855,clarkwa01,harvard,1902
2856,clarkwa01,harvard,1903
2857,clarkwa01,harvard,1904
3061,conlojo01,harvard,1920
3062,conlojo01,harvard,1921
3063,conlojo01,harvard,1922
3943,devench01,harvard,1930
3944,devench01,harvard,1931
3945,devench01,harvard,1932


### Below is one way to find players who went to Harvard, and your boss was mad that you gave him results of high home run hitters who were dead or retired, so now he wants you to limit your results for Harvard grads to players who were playing as recently as 2016.

In [11]:
%%sql
select p."playerID", p."nameFirst", p."nameLast", p."debut", p."finalGame"
from fact_players p
     INNER JOIN dim_college cp on p."playerID" = cp."playerID"
                                        and "schoolID" = 'harvard'
                                        and p."finalGame" >= '2016-01-01'

0 rows affected.


playerID,nameFirst,nameLast,debut,finalGame


### And there are no players. The above was not useful - but how do we add whether or not a player went to college as a feature? That should be more widely applicable than just limiting our results to one school.

In [12]:
%%sql
select distinct p."playerID", p."nameFirst", p."nameLast",
        case when cp."playerID" is not null then 1 else 0 end as college_flg
from fact_players p
    LEFT JOIN dim_college cp on p."playerID" = cp."playerID"
where p."finalGame" >= '2016-01-01'
limit 10;

10 rows affected.


playerID,nameFirst,nameLast,college_flg
hartdo01,Donnie,Hart,0
richaga01,Garrett,Richards,1
duffyma01,Matt,Duffy,0
carasma01,Matt,Carasiti,0
cosarja01,Jarred,Cosart,0
johnser04,Erik,Johnson,1
straida01,Dan,Straily,1
naquity01,Tyler,Naquin,0
scribev01,Evan,Scribner,1
walshco02,Colin,Walsh,0


### Good! Now that you have identified whether or not an active player went to college for your boss, he wants you to put together a dataset that can help him find future hall of famers.

The table to use for this is dim_hall_of_fame.

In [13]:
%%sql
select *
from dim_hof
limit 10

10 rows affected.


index,playerID,yearid,votedBy,ballots,needed,votes,inducted,category,needed_note
0,cobbty01,1936,BBWAA,226.0,170.0,222.0,Y,Player,
1,ruthba01,1936,BBWAA,226.0,170.0,215.0,Y,Player,
2,wagneho01,1936,BBWAA,226.0,170.0,215.0,Y,Player,
3,mathech01,1936,BBWAA,226.0,170.0,205.0,Y,Player,
4,johnswa01,1936,BBWAA,226.0,170.0,189.0,Y,Player,
5,lajoina01,1936,BBWAA,226.0,170.0,146.0,N,Player,
6,speaktr01,1936,BBWAA,226.0,170.0,133.0,N,Player,
7,youngcy01,1936,BBWAA,226.0,170.0,111.0,N,Player,
8,hornsro01,1936,BBWAA,226.0,170.0,105.0,N,Player,
9,cochrmi01,1936,BBWAA,226.0,170.0,80.0,N,Player,


In [14]:
%%sql
select p."playerID", p."nameFirst", p."nameLast", sum(b."HR") as home_runs, hof."inducted"
from fact_players p 
INNER JOIN dim_batting b on p."playerID" = b."playerID"
LEFT JOIN dim_hof hof on p."playerID" = hof."playerID"
group by p."playerID", p."nameFirst", p."nameLast", hof."inducted"
order by home_runs desc
limit 25
;

25 rows affected.


playerID,nameFirst,nameLast,home_runs,inducted
murphda05,Dale,Murphy,5970,N
mcgwima01,Mark,McGwire,5830,N
cepedor01,Orlando,Cepeda,5685,N
hodgegi01,Gil,Hodges,5550,N
riceji01,Jim,Rice,5348,N
kinerra01,Ralph,Kiner,5166,N
santoro01,Ron,Santo,5130,N
parkeda01,Dave,Parker,5085,N
allendi01,Dick,Allen,4914,N
mizejo01,Johnny,Mize,4667,N


### That can't be right... There's some duplication going on in our join. Looking back at the hall of fame dimension, what probably went wrong?

### The query below can show us an example of how we can fix this -

In [15]:
%%sql
select "playerID", count(*) from (
select "playerID", "inducted"
from dim_hof hof
    where "yearid" = (select max("yearid") as yrmax 
                      from dim_hof tsub
                      where tsub."playerID" = hof."playerID"
                      group by "playerID"
                     )

) foo group by "playerID" having count(*) > 1
limit 50

28 rows affected.


playerID,count
applilu01,2
bauerha01,2
bendech01,2
bresnro01,2
burkeje01,2
chancfr01,3
chesbja01,2
clarkfr01,2
colliji01,2
delahed01,2


### Still 28 duplicating results - but out of thousands of records this is an improvement. Let's examine one.

In [16]:
%%sql
select * from dim_hof
where "playerID" = 'walshed01'

9 rows affected.


index,playerID,yearid,votedBy,ballots,needed,votes,inducted,category,needed_note
24,walshed01,1936,BBWAA,226.0,170.0,20.0,N,Player,
120,walshed01,1937,BBWAA,201.0,151.0,56.0,N,Player,
235,walshed01,1938,BBWAA,262.0,197.0,110.0,N,Player,
357,walshed01,1939,BBWAA,274.0,206.0,132.0,N,Player,
468,walshed01,1942,BBWAA,233.0,175.0,113.0,N,Player,
540,walshed01,1945,BBWAA,247.0,186.0,137.0,N,Player,
648,walshed01,1946,Nominating Vote,202.0,,115.0,N,Player,Top 20
651,walshed01,1946,Final Ballot,263.0,198.0,106.0,N,Player,
750,walshed01,1946,Old Timers,,,,Y,Player,


In [17]:
%%sql
select p."playerID", p."nameFirst", p."nameLast", sum(b."HR") as home_runs, hof."inducted"
from fact_players p 
INNER JOIN dim_batting b on p."playerID" = b."playerID"
LEFT JOIN (
            select "playerID", "inducted"
            from dim_hof hof
            where "yearid" = (select max("yearid") as yrmax 
                              from dim_hof tsub
                              where tsub."playerID" = hof."playerID"
                              group by "playerID"
                             )
            ) hof on p."playerID" = hof."playerID"
group by p."playerID", p."nameFirst", p."nameLast", hof."inducted"
order by home_runs desc
limit 10
;

10 rows affected.


playerID,nameFirst,nameLast,home_runs,inducted
bondsba01,Barry,Bonds,762,N
aaronha01,Hank,Aaron,755,Y
ruthba01,Babe,Ruth,714,Y
rodrial01,Alex,Rodriguez,696,
mayswi01,Willie,Mays,660,Y
griffke02,Ken,Griffey,630,Y
thomeji01,Jim,Thome,612,
sosasa01,Sammy,Sosa,609,N
pujolal01,Albert,Pujols,591,
robinfr02,Frank,Robinson,586,Y


### Better, but we still don't have binary results for inducted members.

We can solve this using a case statement, and then some aggregation.

In [18]:
%%sql
select p."playerID", p."nameFirst", p."nameLast", sum(b."HR") as home_runs, 
        max(case when hof."inducted" = 'Y' then 1 else 0 end) as hof_induction
from fact_players p 
INNER JOIN dim_batting b on p."playerID" = b."playerID"
LEFT JOIN (
            select "playerID", "inducted"
            from dim_hof hof
            where "yearid" = (select max("yearid") as yrmax 
                              from dim_hof tsub
                              where tsub."playerID" = hof."playerID"
                              group by "playerID"
                             )
            ) hof on p."playerID" = hof."playerID"
group by p."playerID", p."nameFirst", p."nameLast"
order by home_runs desc
limit 10
;


10 rows affected.


playerID,nameFirst,nameLast,home_runs,hof_induction
bondsba01,Barry,Bonds,762,0
aaronha01,Hank,Aaron,755,1
ruthba01,Babe,Ruth,714,1
rodrial01,Alex,Rodriguez,696,0
mayswi01,Willie,Mays,660,1
griffke02,Ken,Griffey,630,1
thomeji01,Jim,Thome,612,0
sosasa01,Sammy,Sosa,609,0
pujolal01,Albert,Pujols,591,0
robinfr02,Frank,Robinson,586,1


## Now that you've helped provide data that can identify the next hall of famer, your boss wants you to find the top home run hitting team in each of the last 4 seasons so that he can find out what they're doing right.

We will use dim_teams to get information on teams.

In [19]:
%%sql
select *
from dim_teams
limit 10;

10 rows affected.


index,yearID,lgID,teamID,franchID,divID,Rank,G,Ghome,W,L,DivWin,WCWin,LgWin,WSWin,R,AB,H,2B,3B,HR,BB,SO,SB,CS,HBP,SF,RA,ER,ERA,CG,SHO,SV,IPouts,HA,HRA,BBA,SOA,E,DP,FP,name,park,attendance,BPF,PPF,teamIDBR,teamIDlahman45,teamIDretro
0,1871,,BS1,BNA,,3,31,,20,10,,,N,,401,1372,426,70,37,3,60,19.0,73.0,,,,303,109,3.55,22,1,3,828,367,2,42,23,225,,0.838,Boston Red Stockings,South End Grounds I,,103,98,BOS,BS1,BS1
1,1871,,CH1,CNA,,2,28,,19,9,,,N,,302,1196,323,52,21,10,60,22.0,69.0,,,,241,77,2.76,25,0,1,753,308,6,28,22,218,,0.829,Chicago White Stockings,Union Base-Ball Grounds,,104,102,CHI,CH1,CH1
2,1871,,CL1,CFC,,8,29,,10,19,,,N,,249,1186,328,35,40,7,26,25.0,18.0,,,,341,116,4.11,23,0,0,762,346,13,53,34,223,,0.814,Cleveland Forest Citys,National Association Grounds,,96,100,CLE,CL1,CL1
3,1871,,FW1,KEK,,7,19,,7,12,,,N,,137,746,178,19,8,2,33,9.0,16.0,,,,243,97,5.17,19,1,0,507,261,5,21,17,163,,0.803,Fort Wayne Kekiongas,Hamilton Field,,101,107,KEK,FW1,FW1
4,1871,,NY2,NNA,,5,33,,16,17,,,N,,302,1404,403,43,21,1,33,15.0,46.0,,,,313,121,3.72,32,1,0,879,373,7,42,22,227,,0.839,New York Mutuals,Union Grounds (Brooklyn),,90,88,NYU,NY2,NY2
5,1871,,PH1,PNA,,1,28,,21,7,,,Y,,376,1281,410,66,27,9,46,23.0,56.0,,,,266,137,4.95,27,0,0,747,329,3,53,16,194,,0.845,Philadelphia Athletics,Jefferson Street Grounds,,102,98,ATH,PH1,PH1
6,1871,,RC1,ROK,,9,25,,4,21,,,N,,231,1036,274,44,25,3,38,30.0,53.0,,,,287,108,4.3,23,1,0,678,315,3,34,16,220,,0.821,Rockford Forest Citys,Agricultural Society Fair Grounds,,97,99,ROK,RC1,RC1
7,1871,,TRO,TRO,,6,29,,13,15,,,N,,351,1248,384,51,34,6,49,19.0,62.0,,,,362,153,5.51,28,0,0,750,431,4,75,12,198,,0.845,Troy Haymakers,Haymakers' Grounds,,101,100,TRO,TRO,TRO
8,1871,,WS3,OLY,,4,32,,15,15,,,N,,310,1353,375,54,26,6,48,13.0,48.0,,,,303,137,4.37,32,0,0,846,371,4,45,13,217,,0.85,Washington Olympics,Olympics Grounds,,94,98,OLY,WS3,WS3
9,1872,,BL1,BLC,,2,58,,35,19,,,N,,617,2576,747,94,35,14,27,28.0,35.0,15.0,,,434,173,3.02,48,1,1,1545,566,3,63,0,432,,0.829,Baltimore Canaries,Newington Park,,106,102,BAL,BL1,BL1


There is an HR field for home runs in here. How will we get the top teams by season, let alone over the last 4 seasons?

The easiest way is a windowing function:

In [20]:
%%sql
select "name", "yearID", "HR", rank() over (partition by "yearID" order by "HR" desc) as HR_rank
from dim_teams
where "yearID" in (2013, 2014, 2015, 2016)
;

120 rows affected.


name,yearID,HR,hr_rank
Baltimore Orioles,2013,212,1
Seattle Mariners,2013,188,2
Oakland Athletics,2013,186,3
Toronto Blue Jays,2013,185,4
Atlanta Braves,2013,181,5
Boston Red Sox,2013,178,6
Texas Rangers,2013,176,7
Detroit Tigers,2013,176,7
Chicago Cubs,2013,172,9
Cleveland Indians,2013,171,10


Now that you have the teams, what about individual players that your team could try to sign to your team?

What are the fields we will have to join on to get team and player info? 

In [21]:
%%sql
select p."nameFirst", p."nameLast", b."yearID", t."name", sum(b."HR") as home_runs
from fact_players p, dim_batting b, dim_teams t 
where p."playerID" = b."playerID"
    and t."teamID" = b."teamID"
    and t."yearID" = b."yearID"
    and b."yearID" in ('2013','2014','2015','2016')
group by p."nameFirst", p."nameLast", b."yearID", t."name"
order by home_runs desc
limit 10
;

10 rows affected.


nameFirst,nameLast,yearID,name,home_runs
Chris,Davis,2013,Baltimore Orioles,53
Chris,Davis,2015,Baltimore Orioles,47
Mark,Trumbo,2016,Baltimore Orioles,47
Miguel,Cabrera,2013,Detroit Tigers,44
Nelson,Cruz,2015,Seattle Mariners,44
Nelson,Cruz,2016,Seattle Mariners,43
Edwin,Encarnacion,2016,Toronto Blue Jays,42
Khris,Davis,2016,Oakland Athletics,42
Nolan,Arenado,2015,Colorado Rockies,42
Brian,Dozier,2016,Minnesota Twins,42


### The details above are a little busy, how about just getting each team's home run leader over the last 4 years?

In [22]:
%%sql
select *
from (
select p."nameFirst", p."nameLast", t."name", sum(b."HR") as home_runs
        , rank() over (partition by t."name" order by sum(b."HR") desc) as hr_rank
from fact_players p, dim_batting b, dim_teams t 
where p."playerID" = b."playerID"
    and t."teamID" = b."teamID"
    and t."yearID" = b."yearID"
    and b."yearID" in ('2013','2014','2015','2016')
group by p."nameFirst", p."nameLast", t."name"
) foo
where "hr_rank" = 1
;

30 rows affected.


nameFirst,nameLast,name,home_runs,hr_rank
Paul,Goldschmidt,Arizona Diamondbacks,112,1
Freddie,Freeman,Atlanta Braves,93,1
Chris,Davis,Baltimore Orioles,164,1
David,Ortiz,Boston Red Sox,140,1
Anthony,Rizzo,Chicago Cubs,118,1
Jose,Abreu,Chicago White Sox,91,1
Jay,Bruce,Cincinnati Reds,99,1
Carlos,Santana,Cleveland Indians,100,1
Nolan,Arenado,Colorado Rockies,111,1
Miguel,Cabrera,Detroit Tigers,125,1


## Takeaways
* Be cognizant of your keys and joins. Be aware of cardinality of tables - the limit command and doing a count(*) for a specific field are good ways to determine a unique key for a table and make sure you do not have a bad join. 
* Depending on your needs, you can save a lot of memory and time by using resources available on a database rather than pulling down files locally - it is easier to roll up to the highest level possible there, then pull it down.
* However - you will not be the only user of a database! There will be other analysts and production processes hitting the same tables. If you have a bad query that is taking forever to finish and preventing other queries from completing, you will make your DBAs very angry.