ChillnDrill is an imaginary company that operate on the mining industry, where they specialize in selling pickaxes. 

They have a website, where they list out in details all the kind of pickaxes that they have. Every single type of pickaxe will have its own listing.

They also put their listings elsewhere, on other partner's site, and the majority of their sales came from these partner. That said, the sales from their own website are still the most profitable one, as they won't have to pay the 3rd party for the cost of the sale.

Now, management is looking for ways to increase revenue for the website channel, and one suggested an idea:

"We are using fixed positioning for our listings, and these fixed positioning are not based on any logic, up to my knowledge. Now, UX speaking, a top positioned listing will normally receive more traffics than a bottom one. But with our fixed positioning scheme, what if one of our top listings are actually a bad one? We keep it there, it keep receiving more traffics, while the potential sales that this listing create will not be that big, so we are making a loss here.

Meanwhile, some listings at the bottom might actually be a really good one, but we are not pushing on them. So we are making a loss here as well.

We probably would see an increase in sales if we could identify those listings that are doing well and those that are not. Doing it this way, we won't have to spend more on marketing, and only have to put in effort in term of technology."

What do you think of this executive's idea? Do you think it would be feasical, in term of technology, in term of business, etc?

If you think it's a good idea, please try building a simple version of it:
- Priority:
    1. For a period time of pt (could be a month, could be a week, or maybe the last 7 days), every listings should receive a minimum certain amount of traffics tr. We have no idea how much should tr be, please advise.
    2. Listings that do well should continue to receive traffics, while listings that are not doing that well, after receiving a certain amount of traffics tr for the time period pt, will receive no more.

    Your model should be of the simpliest version, taking in the input of page_views and total_sales, returning the rank of the specific listing at time t.

Code, pseudo code or math alone are fine. Please do not spend more than 2 hours working on this.

The file pickaxe.json contains the total sales data for pickaxe that came from the website channel for a certain amount of time (one week). Please use this data to gauge an understanding of the situation. Feel free to reach out to us for any clarification. We're happy to answer any questions you may have.

In [1]:
import pandas as pd
import re

In [2]:
neg_reviews_dataset = pd.read_csv(r"Tweets.csv",error_bad_lines=False)


In [3]:
neg_reviews_dataset["text"][74]

"@VirginAmerica not worried, it's been a great ride in a new plane with great crew. All airlines should be like this."

In [4]:
df = neg_reviews_dataset.filter(["airline_sentiment", "negativereason", "text"]) 

In [5]:
df.head(5)

Unnamed: 0,airline_sentiment,negativereason,text
0,neutral,,@VirginAmerica What @dhepburn said.
1,positive,,@VirginAmerica plus you've added commercials t...
2,neutral,,@VirginAmerica I didn't today... Must mean I n...
3,negative,Bad Flight,@VirginAmerica it's really aggressive to blast...
4,negative,Can't Tell,@VirginAmerica and it's a really big bad thing...


In [6]:
df = df.loc[df['airline_sentiment'] == 'negative']

In [7]:
df.head(5)

Unnamed: 0,airline_sentiment,negativereason,text
3,negative,Bad Flight,@VirginAmerica it's really aggressive to blast...
4,negative,Can't Tell,@VirginAmerica and it's a really big bad thing...
5,negative,Can't Tell,@VirginAmerica seriously would pay $30 a fligh...
15,negative,Late Flight,@VirginAmerica SFO-PDX schedule is still MIA.
17,negative,Bad Flight,@VirginAmerica I flew from NYC to SFO last we...


In [8]:
df.loc[df['negativereason'].isin(['Bad Flight', 
                                  'Flight Attendant Complaints']), 
       ['negativereason']] = 'Bad Flights'

In [9]:
df.loc[df['negativereason'].isin(['Customer Service Issue', 
                                  'Flight Booking Problems', 
                                  'longlines']), 
       ['negativereason']] = 'Customer Service'

In [10]:
df.loc[df['negativereason'].isin(['Lost Luggage',
                                  'Damaged Luggage']), 
       ['negativereason']] = 'Luggage Issues'

In [11]:
df.loc[df['negativereason'].isin(['Late Flight',
                                  'Cancelled Flight']), 
       ['negativereason']] = 'Flight Cancellation and Delays'

In [12]:
df

Unnamed: 0,airline_sentiment,negativereason,text
3,negative,Bad Flights,@VirginAmerica it's really aggressive to blast...
4,negative,Can't Tell,@VirginAmerica and it's a really big bad thing...
5,negative,Can't Tell,@VirginAmerica seriously would pay $30 a fligh...
15,negative,Flight Cancellation and Delays,@VirginAmerica SFO-PDX schedule is still MIA.
17,negative,Bad Flights,@VirginAmerica I flew from NYC to SFO last we...
...,...,...,...
14631,negative,Bad Flights,@AmericanAir thx for nothing on getting us out...
14633,negative,Flight Cancellation and Delays,@AmericanAir my flight was Cancelled Flightled...
14634,negative,Flight Cancellation and Delays,@AmericanAir right on cue with the delaysðŸ‘Œ
14636,negative,Customer Service,@AmericanAir leaving over 20 minutes Late Flig...


In [13]:
tmp = df



tmp['text'] = [re.sub(r"@[a-zA-Z]*", '', elem) for elem in tmp['text']]

tmp



Unnamed: 0,airline_sentiment,negativereason,text
3,negative,Bad Flights,"it's really aggressive to blast obnoxious ""en..."
4,negative,Can't Tell,and it's a really big bad thing about it
5,negative,Can't Tell,seriously would pay $30 a flight for seats th...
15,negative,Flight Cancellation and Delays,SFO-PDX schedule is still MIA.
17,negative,Bad Flights,I flew from NYC to SFO last week and couldn'...
...,...,...,...
14631,negative,Bad Flights,thx for nothing on getting us out of the coun...
14633,negative,Flight Cancellation and Delays,"my flight was Cancelled Flightled, leaving to..."
14634,negative,Flight Cancellation and Delays,right on cue with the delaysðŸ‘Œ
14636,negative,Customer Service,leaving over 20 minutes Late Flight. No warni...


In [14]:
tmp['text'] = [re.sub(r"https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)", '', elem) for elem in tmp['text']]

In [15]:

tmp['text'] = [elem.encode('ascii', 'ignore').decode('ascii') for elem in tmp['text']]

tmp

Unnamed: 0,airline_sentiment,negativereason,text
3,negative,Bad Flights,"it's really aggressive to blast obnoxious ""en..."
4,negative,Can't Tell,and it's a really big bad thing about it
5,negative,Can't Tell,seriously would pay $30 a flight for seats th...
15,negative,Flight Cancellation and Delays,SFO-PDX schedule is still MIA.
17,negative,Bad Flights,I flew from NYC to SFO last week and couldn'...
...,...,...,...
14631,negative,Bad Flights,thx for nothing on getting us out of the coun...
14633,negative,Flight Cancellation and Delays,"my flight was Cancelled Flightled, leaving to..."
14634,negative,Flight Cancellation and Delays,right on cue with the delays
14636,negative,Customer Service,leaving over 20 minutes Late Flight. No warni...


In [16]:
tmp['text'][39]

' Your chat support is not working on your site: '

In [21]:
tmp = tmp.filter(['negativereason', 'text'])
tmp

Unnamed: 0,negativereason,text
3,Bad Flights,"it's really aggressive to blast obnoxious ""en..."
4,Can't Tell,and it's a really big bad thing about it
5,Can't Tell,seriously would pay $30 a flight for seats th...
15,Flight Cancellation and Delays,SFO-PDX schedule is still MIA.
17,Bad Flights,I flew from NYC to SFO last week and couldn'...
...,...,...
14631,Bad Flights,thx for nothing on getting us out of the coun...
14633,Flight Cancellation and Delays,"my flight was Cancelled Flightled, leaving to..."
14634,Flight Cancellation and Delays,right on cue with the delays
14636,Customer Service,leaving over 20 minutes Late Flight. No warni...


In [24]:
tmp.to_csv('negative-review.csv', columns = ['negativereason', 'text'], index=False)