#  Consignes

## Description

Ouvrir le fichier ks-projects-201801.csv, il recense environ 100 000 projets KickStarter. Intégrer les données directement avec L'API Python dans une base de données Mongo. 

Il conviendra de bien spécifier manuellement l'ID du document. Pensez aussi à bien formatter le type des données pour profiter des méthodes implémentées par Mongo. L'ensemble de données n'est pas forcément nécessaire, c'est à vous de créer votre modèle de données.

## Questions

- 1) Récupérer les 5 projets ayant reçu le plus de promesse de dons.
- 2) Compter le nombre de projets ayant atteint leur but.
- 3) Compter le nombre de projets pour chaque catégorie.
- 4) Compter le nombre de projets français ayant été instanciés avant 2016.
- 5) Récupérer les projets américains ayant demandé plus de 200 000 dollars.
- 6) Compter le nombre de projet ayant "Sport" dans leur nom

In [9]:
import pandas as pd
import pymongo
import pprint

In [2]:
client = pymongo.MongoClient('mongodb://mongo:27017')
database = client['exercices']
collection = database['kickstarter']

In [3]:
df_ks = pd.read_csv("./data/ks-projects-201801-sample.csv")
df_ks.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,ID,name,category,main_category,currency,deadline,goal,launched,pledged,state,backers,country,usd pledged,usd_pledged_real
0,872782264,"Scott Cooper's Solo CD ""A Leg Trick"" (Canceled)",Rock,Music,USD,2011-09-16,2000,2011-08-17 06:31:31,1145,canceled,24,US,1145.0,1145.0
1,1326492673,Ohceola jewelry,Fashion,Fashion,USD,2012-08-22,18000,2012-07-23 20:46:48,1851,failed,28,US,1851.0,1851.0
2,1688410639,Sluff Off & Harald: Two latest EGGs are Classi...,Tabletop Games,Games,USD,2016-07-19,2000,2016-07-01 21:55:54,7534,successful,254,US,3796.0,7534.0
3,156812982,SketchPlanner: Create and Plan- all in one bea...,Art Books,Publishing,USD,2017-09-27,13000,2017-08-28 15:47:02,16298,successful,367,US,2670.0,16298.0
4,1835968190,Proven sales with custom motorcycle accessories,Sculpture,Art,CAD,2016-02-24,5000,2016-01-25 17:37:10,1,failed,1,CA,0.708148,0.738225


Ce warning intervient lorsque pandas n'arrive pas à inférer le type de données. Il est sympa il précise les colones 6,8,10,12. 

In [4]:
df_ks.columns[[6,8,10,12]]

Index(['goal', 'pledged', 'backers', 'usd pledged'], dtype='object')

## Question 0

### Netoyer les données

In [20]:
df_ks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150000 entries, 0 to 149999
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   ID                150000 non-null  int64  
 1   name              149998 non-null  object 
 2   category          150000 non-null  object 
 3   main_category     150000 non-null  object 
 4   currency          150000 non-null  object 
 5   deadline          150000 non-null  object 
 6   goal              150000 non-null  object 
 7   launched          150000 non-null  object 
 8   pledged           150000 non-null  object 
 9   state             150000 non-null  object 
 10  backers           150000 non-null  object 
 11  country           150000 non-null  object 
 12  usd pledged       148518 non-null  object 
 13  usd_pledged_real  150000 non-null  float64
dtypes: float64(1), int64(1), object(12)
memory usage: 16.0+ MB


In [18]:
#df_ks=df_ks.astype({'launched': 'date'})
#df_ks=df_ks.astype({'deadline': 'date'})
df_ks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150000 entries, 0 to 149999
Data columns (total 14 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   ID                150000 non-null  int64  
 1   name              149998 non-null  object 
 2   category          150000 non-null  object 
 3   main_category     150000 non-null  object 
 4   currency          150000 non-null  object 
 5   deadline          150000 non-null  object 
 6   goal              150000 non-null  object 
 7   launched          150000 non-null  object 
 8   pledged           150000 non-null  object 
 9   state             150000 non-null  object 
 10  backers           150000 non-null  object 
 11  country           150000 non-null  object 
 12  usd pledged       148518 non-null  object 
 13  usd_pledged_real  150000 non-null  float64
dtypes: float64(1), int64(1), object(12)
memory usage: 16.0+ MB


### Importer les données

In [4]:
import json
data_json = json.loads(df_ks.to_json(orient='records'))
collection.insert_many(data_json)

<pymongo.results.InsertManyResult at 0x7fadbe836900>

## Question 1

In [7]:
collection.find_one(sort=[("usd_pledged_real", pymongo.DESCENDING)])

{'_id': ObjectId('5fc4fa9c5ee46acf34b2e2e2'),
 'ID': 342886736,
 'name': "COOLEST COOLER: 21st Century Cooler that's Actually Cooler",
 'category': 'Product Design',
 'main_category': 'Design',
 'currency': 'USD',
 'deadline': '2014-08-30',
 'goal': '50000.0',
 'launched': '2014-07-08 10:14:37',
 'pledged': '13285226.36',
 'state': 'successful',
 'backers': '62642',
 'country': 'US',
 'usd pledged': '13285226.36',
 'usd_pledged_real': 13285226.36}

In [15]:
projects = []
for i in collection.find({"usd_pledged_real" : {"$gt" : 12000000}}) :
    projects.append(i)
pprint.pprint(projects)

[{'ID': 342886736,
  '_id': ObjectId('5fc4fa9c5ee46acf34b2e2e2'),
  'backers': '62642',
  'category': 'Product Design',
  'country': 'US',
  'currency': 'USD',
  'deadline': '2014-08-30',
  'goal': '50000.0',
  'launched': '2014-07-08 10:14:37',
  'main_category': 'Design',
  'name': "COOLEST COOLER: 21st Century Cooler that's Actually Cooler",
  'pledged': '13285226.36',
  'state': 'successful',
  'usd pledged': '13285226.36',
  'usd_pledged_real': 13285226.36},
 {'ID': 2103598555,
  '_id': ObjectId('5fc4fa9d5ee46acf34b33ddb'),
  'backers': '66673',
  'category': 'Product Design',
  'country': 'US',
  'currency': 'USD',
  'deadline': '2016-06-30',
  'goal': '1000000.0',
  'launched': '2016-05-24 15:49:52',
  'main_category': 'Design',
  'name': 'Pebble 2, Time 2 + All-New Pebble Core',
  'pledged': '12779843.49',
  'state': 'successful',
  'usd pledged': '12779843.49',
  'usd_pledged_real': 12779843.49},
 {'ID': 342886736,
  '_id': ObjectId('5fccd03e5ee46acf34b52cd3'),
  'backers': '6

## Question 2  

In [5]:
liste = []
for i in collection.find({"state": "successful"}) :
    liste.append(i)
len(liste)


159120

## Question 3

In [25]:
categories = collection.distinct("category")
print(categories)

['3D Printing', 'Academic', 'Accessories', 'Action', 'Animals', 'Animation', 'Anthologies', 'Apparel', 'Apps', 'Architecture', 'Art', 'Art Books', 'Audio', 'Bacon', 'Blues', 'Calendars', 'Camera Equipment', 'Candles', 'Ceramics', "Children's Books", 'Childrenswear', 'Chiptune', 'Civic Design', 'Classical Music', 'Comedy', 'Comic Books', 'Comics', 'Community Gardens', 'Conceptual Art', 'Cookbooks', 'Country & Folk', 'Couture', 'Crafts', 'Crochet', 'DIY', 'DIY Electronics', 'Dance', 'Design', 'Digital Art', 'Documentary', 'Drama', 'Drinks', 'Electronic Music', 'Embroidery', 'Events', 'Experimental', 'Fabrication Tools', 'Faith', 'Family', 'Fantasy', "Farmer's Markets", 'Farms', 'Fashion', 'Festivals', 'Fiction', 'Film & Video', 'Fine Art', 'Flight', 'Food', 'Food Trucks', 'Footwear', 'Gadgets', 'Games', 'Gaming Hardware', 'Glass', 'Grace is Leaving', 'Graphic Design', 'Graphic Novels', 'Hardware', 'Hip-Hop', 'Horror', 'Illustration', 'Immersive', 'Indie Rock', 'Installations', 'Interacti

In [26]:
for i in categories :
    print(i, " : ", collection.find( { "category": i } ).count())

  print(i, " : ", collection.find( { "category": i } ).count())


3D Printing  :  542
Academic  :  734
Accessories  :  2408
Action  :  564
Animals  :  194
Animation  :  2034
Anthologies  :  600
Apparel  :  5654
Apps  :  5070
Architecture  :  600
Art  :  6716
Art Books  :  2130
Audio  :  328
Bacon  :  156
Blues  :  226
Calendars  :  224
Camera Equipment  :  330
Candles  :  336
Ceramics  :  256
Children's Books  :  5372
Childrenswear  :  384
Chiptune  :  20
Civic Design  :  260
Classical Music  :  2128
Comedy  :  1846
Comic Books  :  2064
Comics  :  3862
Community Gardens  :  230
Conceptual Art  :  786
Cookbooks  :  434
Country & Folk  :  3580
Couture  :  216
Crafts  :  3668
Crochet  :  122
DIY  :  960
DIY Electronics  :  716
Dance  :  1802
Design  :  3282
Digital Art  :  1048
Documentary  :  12996
Drama  :  1742
Drinks  :  1990
Electronic Music  :  1716
Embroidery  :  98
Events  :  644
Experimental  :  714
Fabrication Tools  :  192
Faith  :  878
Family  :  260
Fantasy  :  264
Farmer's Markets  :  350
Farms  :  964
Fashion  :  6758
Festivals  :  626
Fi

## Question 4

In [25]:
cur = collection.find({ "lauched" : {"$lte" : "2016-01-01 00:00:00"}})
list(cur)

[]

## Question 5

In [21]:
cur = collection.find({ "goal" : {"$gte":2000}})
list(cur)

[{'_id': ObjectId('5fc4fa9a5ee46acf34b1b7bd'),
  'ID': 872782264,
  'name': 'Scott Cooper\'s Solo CD "A Leg Trick" (Canceled)',
  'category': 'Rock',
  'main_category': 'Music',
  'currency': 'USD',
  'deadline': '2011-09-16',
  'goal': 2000.0,
  'launched': '2011-08-17 06:31:31',
  'pledged': 1145.0,
  'state': 'canceled',
  'backers': 24,
  'country': 'US',
  'usd pledged': 1145.0,
  'usd_pledged_real': 1145.0},
 {'_id': ObjectId('5fc4fa9a5ee46acf34b1b7be'),
  'ID': 1326492673,
  'name': 'Ohceola jewelry',
  'category': 'Fashion',
  'main_category': 'Fashion',
  'currency': 'USD',
  'deadline': '2012-08-22',
  'goal': 18000.0,
  'launched': '2012-07-23 20:46:48',
  'pledged': 1851.0,
  'state': 'failed',
  'backers': 28,
  'country': 'US',
  'usd pledged': 1851.0,
  'usd_pledged_real': 1851.0},
 {'_id': ObjectId('5fc4fa9a5ee46acf34b1b7bf'),
  'ID': 1688410639,
  'name': 'Sluff Off & Harald: Two latest EGGs are Classics "old & new"',
  'category': 'Tabletop Games',
  'main_category': 

## Question 6 

In [29]:
cur=collection.find({"name" : {'$regex' : "Sport" }})
len(list(cur))

646