Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add generic GSoC 2017 import app #67

Closed
wants to merge 1 commit into from
Closed

[WIP] Add generic GSoC 2017 import app #67

wants to merge 1 commit into from

Conversation

pradeepgangwar
Copy link

@pradeepgangwar pradeepgangwar commented Dec 22, 2017

  • currently work in progress.
  • Project rendering not implemented but rendering organization data implemented for reference and same can be done with projects once i get some reviews.
  • If YAML files are needed for reference i can add them here.

Please review and let me know changes it needs.

Related issue: #59

@pradeepgangwar
Copy link
Author

pradeepgangwar commented Dec 22, 2017

Preview of what is extracted (org info)
https://deploy-preview-67--coala-community.netlify.com/gsoc/

gsoc/admin.py Outdated
@@ -0,0 +1,3 @@
from django.contrib import admin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cant use admin in a django-distill system

gsoc/config.py Outdated

def load_cache(filename):
with open(os.path.join(DATA_DIR, filename), 'r') as f:
return ruamel.yaml.load(f, Loader=ruamel.yaml.Loader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EOL at EOF

gsoc/config.py Outdated
import ruamel.yaml
import os

DATA_DIR = os.path.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a global setting for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this. I need more help here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git grep _site

Copy link
Author

@pradeepgangwar pradeepgangwar Jan 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched but could not find any related setting other than STATIC_ROOT mentioned in community.settings as _site. I actually found this snippet of from gci app.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is it.

gsoc/tests.py Outdated
@@ -0,0 +1,3 @@
from django.test import TestCase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this file until there are tests

<body>

{% if error %}
<h3> {{error}} </h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the scraper is moved into django commands, the error log will appear at /log created in #63 , so you dont need to do error presentation here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not actually error log. It appears in case when there are no data for a org. I am caching "no data present" also because it was stated in issue that we need to cache the fact that there is not data present for a org. Probably i should rename it? :).

Copy link
Member

@jayvdb jayvdb Jan 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears in case when there are no data for a org.

That is an error. It means the GSOC app needs to be disabled.

Copy link
Author

@pradeepgangwar pradeepgangwar Jan 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed this part. If related yaml file is not found then we show error. (file is not created if data is not found)

That is an error. It means the GSOC app needs to be disabled.

I do not delete existing yaml files on new query. Should i delete existing yaml files so that if query is made for another year(for which there is no data) so it does not render data from existing files. If there is no data no file gets created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this! Do not render errors here. This app should not exist if there was no gsoc data. it isnt a gsoc org. no app. no yaml. no page. using logging to record the gsoc app was disabled. no other visible trace.

technology['%s'% str(count)] = tech
count = count + 1

item = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double space.

project_data[int(item['id'])] = item
project_data = OrderedDict(sorted(project_data.items(), key=lambda t: t[0]))

with open(os.path.join('_site', 'project_info.yaml'), 'a') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix fix with gsoc


def parse_project(self, response):
mentors = []
org_url = "https://summerofcode.withgoogle.com/archive/2017/organizations/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

year should be a parameter, as we'll also want to import other years, and other orgs will have many years of data to import.

yield response.follow(follow_link, self.parse_org)

def parse_org(self, response):
project_url = "https://summerofcode.withgoogle.com/archive/2017/projects/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

year ; see below

gsoc/views.py Outdated
name = organisation[1].get('name')
return render(request, 'gsoc.html', {'error': name})

id = organisation[1].get('id')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why index of 1? that is a bit odd. the yaml should only have one org in it.

Copy link
Author

@pradeepgangwar pradeepgangwar Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this running this way only. I think that's how it inteprets yaml. Although i will reconsider it and try it via key, value type object. Or it would be good if i get some suggestions :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont need an array for one org. the Google json data structure that you fetch should not be the same as the coala org yaml data structure that you write to disk.

</head>

<body>
<h3 style="text-align: center;"> Coala </h3>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line contains the keyword 'Coala'.

Origin: KeywordBear, Section: generalization.

</head>

<body>
<h3 style="text-align: center;"> coala </h3>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line contains the keyword 'coala'.

Origin: KeywordBear, Section: generalization.

gsoc/config.py Outdated
return ruamel.yaml.load(f, Loader=ruamel.yaml.Loader)

def get_year():
return "2017"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EOL at EOF.

but more importantly, returning 2017 here is inherently wrong. the year is now 2018. The GSoC 2018 program has now started.

And you want 2018 - 1

@@ -0,0 +1,14 @@
# -*- coding: utf-8 -*-

# Define here the models for your scraped items
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is junk added by the generator to guide someone writing their code. It doesnt need to be in our repository

class GsocSpider(scrapy.Spider):
name = "gsoc"
start_urls = [
"https://summerofcode.withgoogle.com/archive/{year}/organizations".format(year = year),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pycodestyle on your code

@coala coala deleted a comment from gitmate-bot Jan 10, 2018
@coala coala deleted a comment from gitmate-bot Jan 10, 2018
@coala coala deleted a comment from gitmate-bot Jan 10, 2018
gsoc/config.py Outdated
import ruamel.yaml
import os

DATA_DIR = os.path.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is it.

gsoc/config.py Outdated

DATA_DIR = os.path.join(
os.path.dirname(__file__), '..',
'_site'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use STATIC_ROOT here. (also create a newcomer issue to fix this in gci app)

gsoc/views.py Outdated
for technology in org.get(key).get('technologies').values():
tech.append(technology)

return render(request, 'gsoc.html', {'id': id, 'name': name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the substitution parameters to the next line

gsoc/views.py Outdated
logger.info('GSoC data not available')
error = 'No GSoC data available for {name} year {year}'.format(
name=org_name, year=year)
return render(request, 'gsoc.html', {'error': error})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, stop rendering an error. use logging to record the error. and then re-raise the exception here.

gsoc/views.py Outdated
}
projects_list.append(item)
return render(request, 'gsoc_projects.html',
{'project_list': projects_list, 'org_name': name})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lay out the these substitution variables so that you can add another one without modifying the existing lines.

<body>

{% if error %}
<h3> {{error}} </h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this! Do not render errors here. This app should not exist if there was no gsoc data. it isnt a gsoc org. no app. no yaml. no page. using logging to record the gsoc app was disabled. no other visible trace.


class GsocSpider(scrapy.Spider):
name = 'gsoc'
start_urls = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not use the preferred quotation marks.

Origin: QuotesBear, Section: all.python.

The issue can be fixed by applying the following patch:

--- a/tmp/tmpvhnur60c/gsocscrape/spiders/gsoc.py
+++ b/tmp/tmpvhnur60c/gsocscrape/spiders/gsoc.py
@@ -33,7 +33,7 @@
         if(organization_link):
             organization_link = organization_link[0].extract().split('/')[4]
         else:
-            logger.info("Organisation {} does not exist in GSoC for {}".format(
+            logger.info('Organisation {} does not exist in GSoC for {}'.format(
                 org_name, year
             ))
             return

tagline = org.get(key).get('tagline')
description = org.get(key).get('description')
tech = []
for technology in org.get(key).get('technologies').values():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code does not comply to PEP8.

Origin: PEP8Bear, Section: all.autopep8.

The issue can be fixed by applying the following patch:

--- a/tmp/tmpvhnur60c/gsoc/views.py
+++ b/tmp/tmpvhnur60c/gsoc/views.py
@@ -33,7 +33,7 @@
                                                  'tagline': tagline,
                                                  'description': description,
                                                  'tech': tech
-                                                })
+                                                 })
 
 
 def projects(request):

for key in org.keys():
name = org.get(key).get('name')
projects = get_projects_data()
projects_list = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code does not comply to PEP8.

Origin: PEP8Bear, Section: all.autopep8.

The issue can be fixed by applying the following patch:

--- a/tmp/tmpvhnur60c/gsoc/views.py
+++ b/tmp/tmpvhnur60c/gsoc/views.py
@@ -61,6 +61,6 @@
                 'mentors': mentors
             }
             projects_list.append(item)
-        return render(request, 'gsoc_projects.html',{'project_list': projects_list,
-                                                     'org_name': name
-                                                    })
+        return render(request, 'gsoc_projects.html', {'project_list': projects_list,
+                                                      'org_name': name
+                                                      })

tagline = org.get(key).get('tagline')
description = org.get(key).get('description')
tech = []
for technology in org.get(key).get('technologies').values():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E124 closing bracket does not match visual indentation

Origin: PycodestyleBear (E124), Section: all.autopep8.

for key in org.keys():
name = org.get(key).get('name')
projects = get_projects_data()
projects_list = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E231 missing whitespace after ','

Origin: PycodestyleBear (E231), Section: all.autopep8.

for key in org.keys():
name = org.get(key).get('name')
projects = get_projects_data()
projects_list = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (83 > 80 characters)

Origin: PycodestyleBear (E501), Section: all.autopep8.

projects = get_projects_data()
projects_list = []
for key in projects.keys():
mentors = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E124 closing bracket does not match visual indentation

Origin: PycodestyleBear (E124), Section: all.autopep8.

for key in org.keys():
name = org.get(key).get('name')
projects = get_projects_data()
projects_list = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is longer than allowed. (83 > 80)

Origin: LineLengthBear, Section: all.linelength.

@pradeepgangwar
Copy link
Author

pradeepgangwar commented Jan 14, 2018

No file gets created if there is no data. I raised 404 exception as i didn't get any way to disable an app. Is there any way to disable the app for a condition.

@jayvdb
Copy link
Member

jayvdb commented Jan 28, 2018

raising 404 is good enough.

Note that there is no problem with the GCI app. master builds on cron work every night. see https://travis-ci.org/coala/community/builds . Something is wrong in your changes.

@pradeepgangwar
Copy link
Author

@jayvdb I will update it. Thanks.

@gitmate-bot gitmate-bot added size/M and removed size/L labels Feb 5, 2018
@RaiVaibhav
Copy link
Member

@pradeepgangwar @jayvdb I tried to clone the forked repo and tried to run the .ci scripts so that I can see the data locally but an error occurred
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pradeepgangwar.github.io/community/static/tasks.yaml

Please help me in setting up.

@RaiVaibhav
Copy link
Member

Please tell me if I am missing some step

@pradeepgangwar
Copy link
Author

Closing this due to inactivity. I am not able to take it to completion.

@pradeepgangwar pradeepgangwar deleted the gsoc-import branch August 29, 2018 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants