# Qery Exploration

This notebook shows example of queries for train and offline validate for CodeSearchNet dataset.

In [2]:
import json

import pandas as pd
from pathlib import Path
pd.set_option('max_colwidth',300)
from pprint import pprint

Before downloading the entire dataset, it may be useful to explore a small sample in order to understand the format and structure of the data.  While the full dataset can be automatically downloaded with the `/script/setup` script located in this repo, we can alternatively download a subset of the data from S3.  

The s3 links follow this pattern:

> https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/{python,java,go,php,ruby,javascript}.zip

For example, the link for the `python` is:

> https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/python.zip

In [35]:
def print_10_docstrings(language: str):
    print(f'---Print docstrings for test code snippets in {language}')
    link_to_dataset_part = f'https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/{language}.zip'
    !wget -Nq {link_to_dataset_part}

    zip_name = f'{language}.zip'
    !unzip -oq {zip_name}
    
    test_file_path = f'{language}/final/jsonl/test/{language}_test_0.jsonl.gz'
    # decompress this gzip file
    !gzip -dfq {test_file_path}
    
    with open(f'{language}/final/jsonl/test/{language}_test_0.jsonl', 'r') as f:
        sample_file = f.readlines()
    
    for i in range(0, len(sample_file), len(sample_file)//10):
        print()
        print(f'____{language}_{i}_____')
        print(json.loads(sample_file[i])['docstring'])

## Python dataset

In [36]:
print_10_docstrings('python')

---Print docstrings for test code snippets in python

____python_0_____
Extracts video ID from URL.

____python_2217_____
Obtain the reconstruction error for the input test_data.

        :param H2OFrame test_data: The dataset upon which the reconstruction error is computed.
        :param bool per_feature: Whether to return the square reconstruction error per feature.
            Otherwise, return the mean square error.

        :returns: the reconstruction error.

____python_4434_____
>>> string = 'apple orange "banana tree" green'
    >>> splitstring(string)
    ['apple', 'orange', 'green', '"banana tree"']

____python_6651_____
Implements the request/response pattern via pub/sub
        using a single wildcard subscription that handles
        the responses.

____python_8868_____
:param file_inp:     a `filename` or ``sys.stdin``?
    :param file_out:     a `filename` or ``sys.stdout`?`

____python_11085_____
Format output using *format_name*.

    This is a wrapper around the :cla

In [37]:
print_10_docstrings('ruby')

---Print docstrings for test code snippets in ruby

____ruby_0_____
Returns a hash in the following format:
 {
   "pod/web-1" => [
     "Pulling: pulling image "hello-world:latest" (1 events)",
     "Pulled: Successfully pulled image "hello-world:latest" (1 events)"
   ]
 }

____ruby_227_____
Enforces the `version_limit`, if set. Default: no limit.
 @api private

____ruby_454_____
Gather slices from params and axis according to indices.

____ruby_681_____
Parse all results in the batch.  Add records to shared list.
 If the record was not found, the bins will be nil.

____ruby_908_____
Adds the file reference with given UUID.

 @param [String] uuid UUID of the object.

____ruby_1135_____
The main method implementing Ruby-like access methods for nested elements

____ruby_1362_____
Stop validating at the Question node

____ruby_1589_____
Upon a failure at the first URL, will automatically retry with the
 second & third ones before finally raising an exception
 Returns an HTTPResponse obje

In [38]:
print_10_docstrings('php')

---Print docstrings for test code snippets in php

____php_0_____
Auto generated seed file.

@return void

____php_2839_____
Attach a function as a server method

@param array|string $function Function name, array of function names to attach,
or SOAP_FUNCTIONS_ALL to attach all functions
@param  string $namespace Ignored
@return Zend_Soap_Server
@throws Zend_Soap_Server_Exception on invalid functions

____php_5678_____
响应命令.

@param \Leevel\Kernel\IApp $app

____php_8517_____
Creates a default WP-CLI packages composer.json.

@param string $composer_path Where the composer.json should be created
@return string Returns the absolute path of the newly created default WP-CLI packages composer.json.

____php_11356_____
Gets the result set for date/pageview pairs

@return ArrayList

____php_14195_____
Set a bulk of input parameters from and array.

@param array $arrayOfParameters

____php_17034_____
Gets an array of files to lint.

@param array $files       array of files to check
@param arra

In [39]:
print_10_docstrings('javascript')

---Print docstrings for test code snippets in javascript

____javascript_0_____
Create an instance of Axios

@param {Object} defaultConfig The default config for the instance
@return {Axios} A new instance of Axios

____javascript_648_____
a function returning the mutations object

@export
@param {object} userState
@returns {AnyObject} the mutations object

____javascript_1296_____
Get all contents of the table/json file object
@param  {string} arguments[0] [Table name]
@param  {string} arguments[1] [Location of the database file] (Optional)
@param  {Function} arguments[2]  [callback function]
 function getAll(tableName, callback) {

____javascript_1944_____
find method in klass prototype chain

____javascript_2592_____
Returns completions for markup syntaxes (HTML, Slim, Pug etc.)
@param  {CodeMirror} editor
@param  {CodeMirror.Position} pos Cursor position in editor
@param  {Object} config Resolved Emmet config
@return {EmmetCompletion[]}

____javascript_3240_____
Listen to chart eve

In [40]:
print_10_docstrings('java')

---Print docstrings for test code snippets in java

____java_0_____
Makes sure the fast-path emits in order.
@param value the value to emit or queue up
@param delayError if true, errors are delayed until the source has terminated
@param disposable the resource to dispose if the drain terminates

____java_2690_____
Generates a JavaScript reverse router.

@param name the router's name
@param routes the reverse routes for this router
@return the router
@deprecated Deprecated as of 2.7.0. Use {@link #create(String, String, String,
JavaScriptReverseRoute...)} instead.

____java_5380_____
Set the subset of columns to read (projection pushdown). Specified as an Avro
schema, the requested projection is converted into a Parquet schema for Parquet
column projection.
<p>
This is useful if the full schema is large and you only want to read a few
columns, since it saves time by not reading unused columns.
<p>
If a requested projection is set, then the Avro schema used for reading
must be compatible

In [41]:
print_10_docstrings('go')

---Print docstrings for test code snippets in go

____go_0_____
// mustWaitPinReady waits up to 3-second until connection is up (pin endpoint).
// Fatal on time-out.

____go_1429_____
// New generator for creating a Buffalo Web application

____go_2858_____
// DeleteOperation deletes (cancels) a running operation

____go_4287_____
// Descendants returns a slice containing all descendants of a node, 'id',
// in d which are an ancestor of at least one of the nodes in 'to'.

____go_5716_____
// Attr returns the value of the named attribute. nil is returned when the
// attribute is not set.

____go_7145_____
// withDeadline is like context.WithDeadline, except it ignores the zero deadline.

____go_8574_____
// MarshalJSON supports json.Marshaler interface

____go_10003_____
// UnmarshalJSON supports json.Unmarshaler interface

____go_11432_____
// PublicationLineageLocator builds a locator from the given href.

____go_12861_____
// RoundTrip calls f(r).

____go_14290_____
// GetInnkeeperCl

## Example of queries-like from hidden data set

1. Output to html file
2. How to determine if a string is a valid word
3. Convert int to string
4. Read JSON data
5. How to read .csv file in an efficient way?