#  Consignes

## Description

Ouvrir le fichier ks-projects-201801.csv, il recense environ 100 000 projets KickStarter. Intégrer les données directement avec L'API Python dans une base de données Mongo. 

Il conviendra de bien spécifier manuellement l'ID du document. Pensez aussi à bien formatter le type des données pour profiter des méthodes implémentées par Mongo. L'ensemble de données n'est pas forcément nécessaire, c'est à vous de créer votre modèle de données.

## Questions

- 1) Récupérer les 5 projets ayant reçu le plus de promesse de dons.
- 2) Compter le nombre de projets ayant atteint leur but.
- 3) Compter le nombre de projets pour chaque catégorie.
- 4) Compter le nombre de projets français ayant été instanciés avant 2016.
- 5) Récupérer les projets américains ayant demandé plus de 200 000 dollars.
- 6) Compter le nombre de projet ayant "Sport" dans leur nom

In [1]:
import pandas as pd
import pymongo

In [2]:
client = pymongo.MongoClient("mongo")
database = client['exercices']
collection = database['kickstarter']

In [3]:
df_ks = pd.read_csv("data/ks-projects-201801-sample.csv")
df_ks.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real
0,872782264,"Scott Cooper's Solo CD ""A Leg Trick"" (Canceled)",Rock,Music,USD,2011-09-16,2000,2011-08-17 06:31:31,1145,canceled,24,US,1145.0,1145.0
1,1326492673,Ohceola jewelry,Fashion,Fashion,USD,2012-08-22,18000,2012-07-23 20:46:48,1851,failed,28,US,1851.0,1851.0
2,1688410639,Sluff Off & Harald: Two latest EGGs are Classi...,Tabletop Games,Games,USD,2016-07-19,2000,2016-07-01 21:55:54,7534,successful,254,US,3796.0,7534.0
3,156812982,SketchPlanner: Create and Plan- all in one bea...,Art Books,Publishing,USD,2017-09-27,13000,2017-08-28 15:47:02,16298,successful,367,US,2670.0,16298.0
4,1835968190,Proven sales with custom motorcycle accessories,Sculpture,Art,CAD,2016-02-24,5000,2016-01-25 17:37:10,1,failed,1,CA,0.708148,0.738225


Ce warning intervient lorsque pandas n'arrive pas à inférer le type de données. Il est sympa il précise les colones 6,8,10,12. 

In [4]:
df_ks.columns[[6,8,10,12]]

Index(['goal', 'pledged', 'backers', 'usd pledged'], dtype='object')

## Question 0

### Netoyer les données

In [5]:
df_ks["launched"] = df_ks["launched"].str[:4]

In [6]:
df_ks = df_ks.drop(columns=['ID', 'currency', 'deadline', 'backers'])

In [7]:
df_ks.columns

Index(['name', 'category', 'main_category', 'goal', 'launched', 'pledged',
       'state', 'country', 'usd pledged', 'usd_pledged_real'],
      dtype='object')

In [8]:
df_ks = df_ks.rename({"usd pledged":"usd_pledged"}, axis='columns')

### Importer les données

In [9]:
collection.delete_many({})

<pymongo.results.DeleteResult at 0x7f6bb2c50fc0>

In [20]:
df_ks.dtypes

name                 object
category             object
main_category        object
goal                 object
launched             object
pledged              object
state                object
country              object
usd_pledged          object
usd_pledged_real    float64
dtype: object

In [10]:
dict = df_ks.to_dict('records')

In [11]:
collection.insert_many(dict)

## question 1

In [12]:
#prendre les 5 pledged les plus grand
cur = collection.find({"pledged":{"$gte" :30}}).sort([("pledged",-1)]).limit(5)
list(cur)


[{'_id': ObjectId('5fc50686d1f608e348e43cfc'),
  'name': 'The Everyday Backpack, Tote, and Sling',
  'category': 'Product Design',
  'main_category': 'Design',
  'goal': 500000.0,
  'launched': '2016',
  'pledged': 6565782.5,
  'state': 'successful',
  'country': 'US',
  'usd_pledged': 1462611.0,
  'usd_pledged_real': 6565782.5},
 {'_id': ObjectId('5fc50651d1f608e348e2e5ae'),
  'name': 'Pono Music - Where Your Soul Rediscovers Music',
  'category': 'Sound',
  'main_category': 'Technology',
  'goal': 800000.0,
  'launched': '2014',
  'pledged': 6225354.98,
  'state': 'successful',
  'country': 'US',
  'usd_pledged': 6225354.98,
  'usd_pledged_real': 6225354.98},
 {'_id': ObjectId('5fc50651d1f608e348e2e8e3'),
  'name': 'The Veronica Mars Movie Project',
  'category': 'Narrative Film',
  'main_category': 'Film & Video',
  'goal': 2000000.0,
  'launched': '2013',
  'pledged': 5702153.38,
  'state': 'successful',
  'country': 'US',
  'usd_pledged': 5702153.38,
  'usd_pledged_real': 5702153.

## Question 2

In [25]:

cur = collection.count_documents({"state" : "successful"})
print(cur)

53040


## Question 3

In [14]:
cur = collection.aggregate([{"$group" : {"_id" : "$category", "NumberByCategory" : {"$sum" : 1}}}])
list(cur)

[{'_id': 'Embroidery', 'NumberByCategory': 49},
 {'_id': 'Public Art', 'NumberByCategory': 1248},
 {'_id': 'Flight', 'NumberByCategory': 158},
 {'_id': 'Rock', 'NumberByCategory': 2707},
 {'_id': 'Performances', 'NumberByCategory': 414},
 {'_id': 'Puzzles', 'NumberByCategory': 95},
 {'_id': 'Audio', 'NumberByCategory': 164},
 {'_id': 'Gaming Hardware', 'NumberByCategory': 178},
 {'_id': 'Poetry', 'NumberByCategory': 532},
 {'_id': 'Theater', 'NumberByCategory': 2786},
 {'_id': 'Makerspaces', 'NumberByCategory': 91},
 {'_id': 'Photobooks', 'NumberByCategory': 608},
 {'_id': 'Family', 'NumberByCategory': 130},
 {'_id': 'Punk', 'NumberByCategory': 127},
 {'_id': 'Zines', 'NumberByCategory': 144},
 {'_id': 'Movie Theaters', 'NumberByCategory': 90},
 {'_id': 'Publishing', 'NumberByCategory': 2332},
 {'_id': 'Community Gardens', 'NumberByCategory': 115},
 {'_id': 'Graphic Design', 'NumberByCategory': 765},
 {'_id': 'Weaving', 'NumberByCategory': 38},
 {'_id': 'Action', 'NumberByCategory': 28

## Question 4

In [22]:
#I did it by counting all the element after extracting the wanted element
cur = collection.count_documents({"$and":[{"launched":{"$lt": "2016"}}, {"country":"FR"}]})
print(cur)

330


## Question 5

In [16]:
cur = collection.find({"$and":[{"goal":{"$gte": 200000}}, {"country":"US"}]})
list(cur)

[{'_id': ObjectId('5fc50631d1f608e348e20a9b'),
  'name': 'Far from Par is a movie about a man and a talking golf ball.',
  'category': 'Comedy',
  'main_category': 'Film & Video',
  'goal': 200000.0,
  'launched': '2014',
  'pledged': 10.0,
  'state': 'failed',
  'country': 'US',
  'usd_pledged': 10.0,
  'usd_pledged_real': 10.0},
 {'_id': ObjectId('5fc50631d1f608e348e20aae'),
  'name': 'A CALL TO ADVENTURE',
  'category': 'Film & Video',
  'main_category': 'Film & Video',
  'goal': 287000.0,
  'launched': '2012',
  'pledged': 1465.0,
  'state': 'failed',
  'country': 'US',
  'usd_pledged': 1465.0,
  'usd_pledged_real': 1465.0},
 {'_id': ObjectId('5fc50632d1f608e348e20b61'),
  'name': 'Storybricks, the storytelling online RPG',
  'category': 'Video Games',
  'main_category': 'Games',
  'goal': 250000.0,
  'launched': '2012',
  'pledged': 23680.54,
  'state': 'failed',
  'country': 'US',
  'usd_pledged': 23680.54,
  'usd_pledged_real': 23680.54},
 {'_id': ObjectId('5fc50632d1f608e348e20

## Question 6 

In [18]:
#to be able to search through the name
collection.create_index([("name",  "text")])

'name_text'

In [26]:
cur = collection.count_documents({ "$text": { "$search": "Sport" } })
print(cur)

318
