# Selecting the smallest of the largest
# 101. oldal

In [8]:
import pandas as pd
movie = pd.read_csv('C:\Anaconda\data\movie.csv')
movie2 = movie[['title_year','movie_title', 'imdb_score', 'budget']]
movie2.head()

Unnamed: 0,title_year,movie_title,imdb_score,budget
0,2009.0,Avatar,7.9,237000000.0
1,2007.0,Pirates of the Caribbean: At World's End,7.1,300000000.0
2,2015.0,Spectre,6.8,245000000.0
3,2012.0,The Dark Knight Rises,8.5,250000000.0
4,,Star Wars: Episode VII - The Force Awakens,7.1,


In [9]:
# Use the nlargest method to select the top 100 movies by imdb_score:
movie2.nlargest(100, 'imdb_score').head()

Unnamed: 0,title_year,movie_title,imdb_score,budget
2725,,Towering Inferno,9.5,
1920,1994.0,The Shawshank Redemption,9.3,25000000.0
3402,1972.0,The Godfather,9.2,6000000.0
2779,,Dekalog,9.1,
4312,2016.0,Kickboxer: Vengeance,9.1,17000000.0


In [10]:
# Chain the nsmallest method to return the five lowest budget films among those with a top 100 score:
movie2.nlargest(100, 'imdb_score').nsmallest(5, 'budget')

Unnamed: 0,title_year,movie_title,imdb_score,budget
4804,2014.0,Butterfly Girl,8.7,180000.0
4801,1997.0,Children of Heaven,8.5,180000.0
4706,1957.0,12 Angry Men,8.9,350000.0
4550,2011.0,A Separation,8.4,500000.0
4636,2012.0,The Other Dream Team,8.4,500000.0


In [12]:
# Selecting the largest of each group by sorting
movie2.sort_values('title_year', ascending=False).head()

Unnamed: 0,title_year,movie_title,imdb_score,budget
3884,2016.0,The Veil,4.7,4000000.0
2375,2016.0,My Big Fat Greek Wedding 2,6.1,18000000.0
2794,2016.0,Miracles from Heaven,6.8,13000000.0
92,2016.0,Independence Day: Resurgence,5.5,165000000.0
153,2016.0,Kung Fu Panda 3,7.2,145000000.0


In [14]:
# Let's look at how to sort both year and score:
movie3 = movie2.sort_values(['title_year','imdb_score'], ascending=False)
movie3.head()

Unnamed: 0,title_year,movie_title,imdb_score,budget
4312,2016.0,Kickboxer: Vengeance,9.1,17000000.0
4277,2016.0,A Beginner's Guide to Snuff,8.7,
3798,2016.0,Airlift,8.5,4400000.0
27,2016.0,Captain America: Civil War,8.2,250000000.0
98,2016.0,Godzilla Resurgence,8.2,


In [15]:
# Now, we use the drop_duplicates method to keep only the first row of every year:
movie_top_year = movie3.drop_duplicates(subset='title_year')
movie_top_year.head()

Unnamed: 0,title_year,movie_title,imdb_score,budget
4312,2016.0,Kickboxer: Vengeance,9.1,17000000.0
3745,2015.0,Running Forever,8.6,5000000.0
4369,2014.0,Queen of the Mountains,8.7,1400000.0
3935,2013.0,"Batman: The Dark Knight Returns, Part 2",8.4,3500000.0
3,2012.0,The Dark Knight Rises,8.5,250000000.0


The default behavior of the drop_duplicates method is to keep the first occurrence of
each unique row, which would not drop any rows as each row is unique. However, the
subset parameter alters it to only consider the column (or list of columns) given to it. In
this example, only one row for each year will be returned. As we sorted by year and score in
the last step, the highest scoring movie for each year is what we get.

It is possible to sort one column in ascending order while simultaneously sorting another
column in descending order. To accomplish this, pass in a list of booleans to the ascending
parameter that corresponds to how you would like each column sorted. The following sorts
title_year and content_rating in descending order and budget in ascending order. It
then finds the lowest budget film for each year with the highest IMDB Score:


In [19]:

movie4 = movie[['movie_title', 'title_year', 'imdb_score', 'budget']]
movie4_sorted = movie4.sort_values(['title_year','imdb_score', 'budget'],
ascending=[False, False, True])
movie4_sorted.drop_duplicates(subset=['title_year']).head(10)

Unnamed: 0,movie_title,title_year,imdb_score,budget
4312,Kickboxer: Vengeance,2016.0,9.1,17000000.0
3745,Running Forever,2015.0,8.6,5000000.0
4804,Butterfly Girl,2014.0,8.7,180000.0
3935,"Batman: The Dark Knight Returns, Part 2",2013.0,8.4,3500000.0
293,Django Unchained,2012.0,8.5,100000000.0
3853,Samsara,2011.0,8.5,4000000.0
97,Inception,2010.0,8.8,160000000.0
582,Inglourious Basterds,2009.0,8.3,75000000.0
66,The Dark Knight,2008.0,9.0,185000000.0
2646,U2 3D,2007.0,8.4,


By default, drop_duplicates keeps the very first appearance, but this behavior may be
modified by passing the keep parameter last to select the last row of each group or False to
drop all duplicates entirely.

# Replicating nlargest with sort_values
we will replicate the Selecting the smallest from the largest recipe with the
sort_values method and explore the differences between the two.

In [24]:
movie = pd.read_csv('C:\Anaconda\data\movie.csv')
movie2 = movie[['movie_title', 'imdb_score', 'budget']]
movie_smallest_largest = movie2.nlargest(100, 'imdb_score') \
.nsmallest(10, 'budget')
movie_smallest_largest

Unnamed: 0,movie_title,imdb_score,budget
4804,Butterfly Girl,8.7,180000.0
4801,Children of Heaven,8.5,180000.0
4706,12 Angry Men,8.9,350000.0
4550,A Separation,8.4,500000.0
4636,The Other Dream Team,8.4,500000.0
2215,Psycho,8.5,806947.0
4425,Casablanca,8.6,950000.0
4397,"The Good, the Bad and the Ugly",8.9,1200000.0
4395,Reservoir Dogs,8.4,1200000.0
4369,Queen of the Mountains,8.7,1400000.0


In [25]:
# we can use sort_values with head again to grab the lowest five by budget:
movie2.sort_values('imdb_score', ascending=False).head(100) \
.sort_values('budget').head(10)

Unnamed: 0,movie_title,imdb_score,budget
4815,A Charlie Brown Christmas,8.4,150000.0
4801,Children of Heaven,8.5,180000.0
4804,Butterfly Girl,8.7,180000.0
4706,12 Angry Men,8.9,350000.0
4636,The Other Dream Team,8.4,500000.0
2215,Psycho,8.5,806947.0
4425,Casablanca,8.6,950000.0
4395,Reservoir Dogs,8.4,1200000.0
4397,"The Good, the Bad and the Ugly",8.9,1200000.0
848,Stargate SG-1,8.4,1400000.0


Take a look at the output from the first DataFrame from step 1 and compare it with the
output from step 3. Are they the same? No! What happened?
The issue arises because more than 100 movies exist with a rating of at least 8.4. Each of the
methods, nlargest and sort_values, breaks ties differently, which results in a slightly
different 100-row DataFrame.

# Calculating a trailing stop order price

For the purposes of this recipe, we will only be examining stop orders used to sell currently
owned stocks. In a typical stop order, the price does not change throughout the lifetime of
the order. For instance, if you purchased a stock for 100 per share, you might want to set a
stop order at 90 per share to limit your downside to 10%.
A more advanced strategy would be to continually modify the sale price of the stop order to
track the value of the stock if it increases in value.
This is called a trailing stop order.
Concretely, if the same 100 stock increases to 120, then a trailing stop order 10 below the
current market value would move the sale price to 108.
The trailing stop order never moves down and is always tied to the maximum value since
the time of purchase. 
If the stock fell from 120 to 110, the stop order would still remain at
108. It would only increase if the price moved above 120.

In [8]:
from iexfinance.stocks import get_historical_data
from datetime import datetime
import pandas as pd
start = datetime(2018, 12, 1)
end = datetime(2019, 1, 24)
dataset = get_historical_data("MSFT", start, end, output_format='pandas')
dataset['Returns']=dataset['close'].pct_change()
dataset['StockName']='Microsoft'
dataset.head(20)

Unnamed: 0_level_0,open,high,low,close,volume,Returns,StockName
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-12-03,113.0,113.42,110.73,112.09,34732772,,Microsoft
2018-12-04,111.94,112.6373,108.2115,108.52,45196984,-0.031849,Microsoft
2018-12-06,105.82,109.24,105.0,109.19,49107431,0.006174,Microsoft
2018-12-07,108.38,109.45,104.3,104.82,45044937,-0.040022,Microsoft
2018-12-10,104.8,107.98,103.89,107.59,40801525,0.026426,Microsoft
2018-12-11,109.8,110.95,107.44,108.59,42381947,0.009295,Microsoft
2018-12-12,110.89,111.27,109.04,109.08,36183020,0.004512,Microsoft
2018-12-13,109.58,110.87,108.63,109.45,31333362,0.003392,Microsoft
2018-12-14,108.25,109.26,105.5,106.03,47043136,-0.031247,Microsoft
2018-12-17,105.41,105.8,101.71,102.89,56957314,-0.029614,Microsoft


In [7]:
#Use the cummax method to track the highest closing price until the current date:
MSFT_close=dataset['close']
MSFT_cummax = MSFT_close.cummax()
MSFT_cummax.head(20)

date
2018-12-03    112.09
2018-12-04    112.09
2018-12-06    112.09
2018-12-07    112.09
2018-12-10    112.09
2018-12-11    112.09
2018-12-12    112.09
2018-12-13    112.09
2018-12-14    112.09
2018-12-17    112.09
2018-12-18    112.09
2018-12-19    112.09
2018-12-20    112.09
2018-12-21    112.09
2018-12-24    112.09
2018-12-26    112.09
2018-12-27    112.09
2018-12-28    112.09
2018-12-31    112.09
2019-01-02    112.09
Name: close, dtype: float64

In [9]:
# To limit the downside to 10%, we multiply tsla_cummax by 0.9.
# This creates the trailing stop order:
MSFT_trailing_stop = MSFT_cummax * .9
MSFT_trailing_stop.head(8)

date
2018-12-03    100.881
2018-12-04    100.881
2018-12-06    100.881
2018-12-07    100.881
2018-12-10    100.881
2018-12-11    100.881
2018-12-12    100.881
2018-12-13    100.881
Name: close, dtype: float64

The cummax method works by retaining the maximum value encountered up to and
including the current value. Multiplying this series by 0.9, or whatever cushion you would
like to use, creates the trailing stop order. In this particular example, Microsoft decreased in
value but its trailing stop has not decreased.