# Data Exploration for Managing Innovation #

This is a Jupyter Notebook, which is an open source web application that you can use to create and share documents that contain live code, equations, visualizations, and text. This cell contain `markdown`, which is a lightweight markup language for creating formatted text using a plain-text editor. I markdown cells, you can document your data analysis, take notes, and construct your data narratives. You can run the code in any cell by selecting the cell (indicated by a vertical line left of the cell) and press `Shift`+`Enter`.

In [1]:
# Install a openpyxl in the current Jupyter kernel with pip
import sys
!{sys.executable} -m pip install openpyxl



Let us start by importing the data:

In [2]:
from util import read_data
(ideas, comments, ideator) = read_data()

Your data set has two tabular components `ideas`  and `comments`

Inspect `ideas` by printing the first 7 rows:

In [3]:
print(ideas.shape)
ideas.head(7)

(108, 11)


Unnamed: 0,user_id,Submission.ID,Topic.Alias,Title,Body,idea type,tags,Publish.Date,Number.of.Votes,Status(selectedbyexpert),prior_experience(idea generation)
0,8,4,core-experience,Make it easier to open the box,Often I see a LEGO box get torn open because i...,Current experience journey,na,2012-12-20 05:05:21,25,1,0
1,8,91,your-ideas,Reconnect with childhood,When taking visitors through the Idea House I ...,product ideas-sets,"retro, classic",2013-01-25 16:58:21,7,0,0
2,16,205,core-experience,LEGO City Counstryside,Hi :) During the christmas holiday I was play...,New ways to build and play,,2013-01-07 07:37:46,2,0,0
3,17,51,your-ideas,Garbage Cans in Billund Parking House,The parking house in Billund provides great pa...,Facilities > Optimization,"garbage, trash, parking, implemented, environm...",2012-12-20 14:15:49,15,0,0
4,17,65,your-ideas,Save Energy,We can easily help saving energy by switching ...,Sustainability > Strategy,"planet promise, energy efficiency",2012-12-20 21:01:05,19,0,0
5,17,147,your-ideas,LEGO Ideas in Brand Retail Stores,By installing a few PCs in brand retail stores...,Retail > Experience,"brand retail, digital experience, new business...",2013-01-02 08:49:00,6,1,0
6,33,125,your-ideas,Antimicrobial LEGO Bricks,There are so many papers and projects about th...,Manufacturing/Engineering > Materials,antimicrobial,2012-12-28 12:08:25,8,1,0


Inspect `comments` by printing the last 7 rows:

In [4]:
print(comments.shape)
comments.head(7)

(526, 10)


Unnamed: 0,user_id,Topic.Alias,Submission.ID,Submission.Title,Comment.ID,Parent.ID,Root.ID,Comment,Posted.At,Number of votes
0,8,core-experience,4,Make it easier to open the box,576,492.0,492.0,"Thank you Camilla, that would be great!",2013-01-15 20:55:00,0
1,15,core-experience,4,Make it easier to open the box,1,,,Whether we can use something like Rip Cord to ...,2012-12-20 06:42:00,1
2,47,core-experience,4,Make it easier to open the box,2,,,"I agree, there must be a better solution. The ...",2012-12-20 07:37:00,0
3,60,core-experience,4,Make it easier to open the box,5,,,Is it possible to use sellotape to seal all th...,2012-12-20 08:18:00,1
4,101,core-experience,4,Make it easier to open the box,29,12.0,12.0,"That is good to know, thanks Camila. Is there ...",2012-12-20 10:49:00,0
5,101,core-experience,4,Make it easier to open the box,275,264.0,264.0,"Yes, it would be easy and quite cheap to add a...",2013-01-03 14:27:00,2
6,101,core-experience,4,Make it easier to open the box,586,491.0,489.0,"Yes, that's even better. A fun and engaging wa...",2013-01-16 10:34:00,0


In [None]:
comments.tail(7)

In [None]:
print(ideator.shape)
ideator.head(7)

Inspect the fifth element in the `Body` column of `ideas`

In [None]:
ideas["Body"][4]

Notice that Python uses zero-based numbering, so the fifth element is assigned to index 4

Print all ideas for inspection:

In [None]:
for (i, idea) in enumerate(ideas["Body"]):
    print(f"[INFO] idea {i}: {idea}\n")

Get all idea, votes, and all comments related to `Submission.ID` 205

In [None]:
id = 205# change submission id for other idea

idea = ideas["Body"][ideas["Submission.ID"] == id]
print(idea.values[0])
votes = ideas["Number.of.Votes"][ideas["Submission.ID"] == id]
print(f"\n[INFO] Number of votes for idea {id}: {votes.values[0]}\n")
comment = comments["Comment"][comments["Submission.ID"] == id]
for (i, s) in enumerate(comment):
    print(f"Comment {i} for idea {id}: {s}")

Be aware that not all ideas may have comments

Inspect number of votes that and idea

In [None]:
ideas["Number.of.Votes"]

using descriptive statistics

In [None]:
ideas["Number.of.Votes"].describe()

and visualization

In [None]:
ideas["Number.of.Votes"].hist()

In [None]:
idx = ideas["Number.of.Votes"] == 25
voteIdeas = ideas["Body"][idx]
for idea in voteIdeas:
    print(idea)
    print()