# Managing Innovation

Solution developed for group 4 based on the requirement specification. 

**Summary of requirement specification:**



TO DO:

    [] Check encoding -> some of of the text looks strange :((
    [] Move into group folder
    [] Move clean text function into seperate file
    [] Make sure that the clean text function works properly
    [] Standardize/normalize the varibles for scoring to allow for weighing to happen more easily
    [] Check if stop words should be included or not for sentiment score

To ensure you have all the dependencies, run the following chunk.

In [219]:
! pip install --user -U nltk
nltk.download()



Then all the functions and packages needed for the task are loaded.

In [1]:
from nltk.corpus import stopwords
from nltk.sentiment.util import *
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd

from helper_functions import *

Then the data is imported. For a overview of the data, go to the `data_exploration.ipynb` in the main folder. 

In [2]:
(ideas, comments, ideator) = read_data()

### Gather data needed to score the ideas based on requirement specification

The data needed includes the number of votes, whether the idea was selected by an expert and the mean sentiment score of the comments to each idea. 


Before calculating the sentiment score, the comments need to be preprocessed. This is done by removing stop words and punctuation, as well as ensure that everything is lower case. Furthermore, the words are lemmatized. All of this is done using the `clean_text` function which is defined in the `helper_functions.py` file. 

In [3]:
com = comments['Comment']

In [5]:
stop_words = set(stopwords.words("english")) # list of stop words
clean_com = clean_text(com, stop_words) # clean comments using clean_text function from helper_functions.py file (see file for code)

In [7]:
sent = SentimentIntensityAnalyzer() 
comments['sentiment_score'] = [sent.polarity_scores(clean_com[i][0])['compound'] for i in range(len(clean_com))]

Calculating the mean sentiment score for each idea

In [8]:
df = pd.DataFrame()
for i in ideas['Submission.ID'].unique():
    # average the sentiment score for comments on each idea
    avg_score = comments.loc[comments['Submission.ID'] == i, 'sentiment_score'].mean()

    df = df.append({'submission_id': i, 'avg_sentiment': avg_score}, ignore_index=True)

# the number of votes for each idea
df['votes'] = list(ideas['Number.of.Votes'])
# whether the idea was rated by an expert
df['expert'] = list(ideas['Status(selectedbyexpert)'])
# including the idea in the dataframe as to be able to show it later
df['idea'] = ideas['Body']

### Calculating the score of each of the ideas

Changing the numbers in the following chunk allows you to weigh the different variables differently when calculating the score. 

In [9]:
# setting up weights
sentiment_weight = 1
vote_weight = 1
expert_weight = 1

Now lets calculate the score of each idea

In [10]:
df['score'] = sentiment_weight * df['avg_sentiment'] + vote_weight * df['votes'] + expert_weight * df['expert']

### 10 ideas with the highest score

In [11]:
df.nlargest(10, 'score')

Unnamed: 0,submission_id,avg_sentiment,votes,expert,idea,score
23,21.0,0.57691,29,0,I think it would make the gifts of the Advent ...,29.57691
71,120.0,0.772962,28,0,Consumer Services handles many contacts from d...,28.772962
0,4.0,0.605035,25,1,Often I see a LEGO box get torn open because i...,26.605035
9,30.0,0.383831,25,1,IÂ´m so happy that we in the P-shop now has th...,26.383831
100,181.0,0.553592,21,1,Far East sourced components are packed in a pl...,22.553592
21,73.0,0.528367,20,0,I have for some time been wondering. What actu...,20.528367
104,194.0,0.466227,20,0,An element that really could give many new bui...,20.466227
4,65.0,0.548744,19,0,We can easily help saving energy by switching ...,19.548744
95,171.0,0.912883,15,0,Create some sets that can be build as teams (2...,15.912883
18,22.0,0.735558,15,0,"Add ""QR code"" like grafics in some of the free...",15.735558


To print the ideas with the highest scores, run the code below. 

In [12]:
for (i, idea) in enumerate(df.nlargest(10, 'score')['idea']):
    print(f"[INFO] idea {df.nlargest(10, 'score')['submission_id'].iloc[i]}: {idea}\n")

[INFO] idea 21.0: I think it would make the gifts of the Advent Calendar truely amazing, if after the 24th there would be a building instruction where using the the bricks of 24 small gifts you could make one big thing. F.ex for Star Wars that you could create a bigger spaceship with the bricks. First,  it would make  day 24 truely special. Second,  it would show to the kids that bricks can be used in different ways,  and encourage them to be creative (and not just assemble sets once)."

[INFO] idea 120.0: Consumer Services handles many contacts from disappointed fans who have purchased â€œfactory sealedâ€ packages that are missing all the minifigures. These consumers have usually bought the set at a non LBR retail store.   There is nothing worse than opening up your new Star Wars or Ninjago set and finding that the minifigures are all gone! The current tape that we use to seal the boxes can easily be tampered with. My colleague, Rocky, did an experiment with a TMNT set that he purcha

### 10 ideas with the lowest score

In [51]:
df.nsmallest(10, 'score')

Unnamed: 0,submission_id,avg_sentiment,votes,expert,idea,score
107,204.0,0.43275,0,0,Give the building an understanding of how far ...,0.43275
13,13.0,0.7459,0,0,New ways of playing LEGO have been innovated a...,0.7459
96,175.0,0.5695,1,0,I have made some tests and would like to share...,1.5695
68,93.0,0.0,2,0,"Now that Lego has 80 years, why not to launch...",2.0
2,205.0,0.25935,2,0,Hi :) During the christmas holiday I was play...,2.25935
77,103.0,0.4404,2,0,"I have been collecting the Winter Bakery, Toy ...",2.4404
56,124.0,0.4546,2,0,Will we be able to have a Staff Shop in the Sl...,2.4546
94,200.0,0.8887,2,0,"Unfortunately I have recognized, that under th...",2.8887
57,126.0,0.57584,3,0,It would be a great addition to LEGO boxes or ...,3.57584
74,101.0,0.623657,3,0,While building with city products it is easy t...,3.623657


To print the ideas with the lowest scores, run the code below. 

In [58]:
for (i, idea) in enumerate(df.nsmallest(10, 'score')['idea']):
    print(f"[INFO] idea {df.nsmallest(10, 'score')['submission_id'].iloc[i]}: {idea}\n")

[INFO] idea 204.0: Give the building an understanding of how far they are in the building experience and why they are building what they are building. Examples could be:* Step 4 of 45 - so the builder understand that they have a lot of steps to complete still, * Show on every page of a sub-build where the final construct will fit on the model"

[INFO] idea 13.0: New ways of playing LEGO have been innovated and develped for a decade. For example 2000-2001 STARWARS Yoda and Darth Vader figure are built by bricks, 2006 Spongebob figure built by bricks and this year 2012 STARWARS R2-D2 can be built by bricks. LEGO bricks can create everything. My new idea is the theme figures are not just minifigures, but also brick- buildings. There is something similar like the figures we saw in LEGOLAND which are built by the bricks. In future LEGO can develop and create more theme figures which are able to built by bricks. For example SUPERHERO BATMAN, LEGO FRIENDS figure, LEGO TOYSTORY BUSSYLIGHTYEAR 