### Problem statement

The data is related to a coding platform that hosts coding challenges. They have a unique business model, where they crowdsource problems from various creators(authors). These authors create the problem and release it on the client's platform. The users then select the challenges they want to solve. 

This dataset contains information about each coding problem. It contains information about the problem, about the author who created it and a list of users who have attempted the problem.

Below are the fields that can be founf within each document in the collection -

- `challenge_id` - Unique id of the challenge problem

- `programming_language` - Programming language for the challenge

- `total_submissions` - Total submissions by all users

- `publish_date` - Publishing date for the challenge

- `author` - Embedded document about the author of the challenge.
> - `id` - Author id
> - `gender` - Author gender
> - `org_id` - Organisation if for author

- `users` - List of users who have attempted the challenge

----

### Connecting to MongoDB


----

In [1]:
# Importing the required libraries
import pymongo
import pprint as pp


pp.sorted = lambda x, key=None: x

In [2]:
client = pymongo.MongoClient("mongodb://localhost:27017/")

---
### Importing data

----

In [3]:
db = client.Avidhya

In [4]:
db.list_collection_names()

['Assignment']

In [5]:
pp.pprint(
    db.Assignment.find_one()
)

{'_id': ObjectId('60dab9f75945974466d8d64d'),
 'challenge_id': 'CI23478',
 'programming_language': 2,
 'total_submissions': 37,
 'publish_date': datetime.datetime(2006, 6, 5, 0, 0),
 'author': {'id': 'AI563576', 'gender': 'M', 'org_id': 'AOI100001'},
 'users': [32876, 88820, 97150, 97359]}


---
### Assignment Questions

----

### Q1. 

Find the number of documents in the collection

In [6]:
# number of documents in the collection
db.Assignment.find().count()

  db.Assignment.find().count()


5606

### Q2. 

Find the number of unique `programming_language` and `challenge_id`

In [7]:
# Unique values in programming_language
len(db.Assignment.distinct('programming_language'))

3

In [8]:
# Unique values in challenge_id
len(db.Assignment.distinct('challenge_id'))

5606

### Q3. 

How many documents are there where the challenge was created between `2009-01-01` and `2010-01-01`? 

In [9]:
# Import datetime library
from datetime import datetime

In [10]:
db.Assignment.find(
                    {
                    'publish_date':{
                                    '$gte': datetime(2009, 1, 1),
                                    '$lte': datetime(2010, 1, 1)
                                    }
                    }
                  ).count()

  db.Assignment.find(


888

### Q4. 

How many challenges have been written by author `AI563576` in either `programming_language` `1` or `3` ?


In [11]:
db.Assignment.find(
                   {'author.id':'AI563576',
                    'programming_language':{"$in":[1,3]}
                    }
                  ).count()

  db.Assignment.find(


41

### Q5. 

How many documents are there where the challenges have been created by a female author and where the author belong to either the 'AOI100013' organisation or the 'AOI100013' organisation?

In [12]:
db.Assignment.find(
                    {
                     'author.gender':'F',
                     'author.org_id':{"$in":['AOI100013','AOI100013']
                                     }
                    }
                ).count()

  db.Assignment.find(


5

### Q6.

Find the top 5 challenges where either the challenge has been attempted by exactly 100 `users` or where the `total_submissions` is between 100 and 200, both inclusive?

In [13]:
cur = db.Assignment.find(
                        # Query
                        {
                          '$or':[
                                  {'users':{'$size':100}},
                                  {'total_submissions':{ '$gte': 100,'$lte': 200 }}
                                ]
                          },
                          # challenge_id Projection
                        {'challenge_id': 1,
                            '_id': 0
                        }
                        ).sort([('challenge_id', pymongo.ASCENDING)]).limit(5)
# Print all documents
for doc in cur:
    pp.pprint(doc)

{'challenge_id': 'CI23482'}
{'challenge_id': 'CI23494'}
{'challenge_id': 'CI23497'}
{'challenge_id': 'CI23500'}
{'challenge_id': 'CI23516'}


### Q7. 

How documents are there where either the `publish_date > 2010-01-01` and `total_submissions > 100`, or the `publish_date < 2000-01-01` and `total_submissions > 1000` ?

In [14]:
db.Assignment.find({
                   '$or':[
                          {'$and':[
                                   {'publish_date':{'$gt':datetime(2010, 1, 1)}},
                                   {'total_submissions':{'$gt':100}}
                                  ]
                          },
                          {'$and':[
                                  {'publish_date':{'$lt':datetime(2000, 1, 1)}},
                                  {'total_submissions':{'$gt':1000}}
                                 ]
                          }
                         ]
                    }).count()

  db.Assignment.find({


45