Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github index not generated when using grimoirelab/full image #233

Closed
rootsongjc opened this issue Nov 7, 2019 · 16 comments
Closed

Github index not generated when using grimoirelab/full image #233

rootsongjc opened this issue Nov 7, 2019 · 16 comments

Comments

@rootsongjc
Copy link

rootsongjc commented Nov 7, 2019

I setup Grimoirelab with grimoirelab/full docker image.

Here is my setup command.

#!/bin/bash
docker run -p 5601:5601 -p 9000:9200 \
    -v es-data:/var/lib/elasticsearch \
    -v projects.json:/projects.json  \
    -v dashboard.cfg:/dashboard.cfg \
    -v setup.cfg:/override.cfg \
    --name test \
    -t grimoirelab/full

Here are my config files.

projects.josn

{
    "istio-handbook": {
        "git": [
            "https://github.com/servicemesher/istio-handbook"
        ],
        "gitub": [
            "https://github.com/servicemesher/istio-handbook"
        ]
    }
}

setup.cfg

[github]
api-token = $MY_API_TOKEN

[projects]
projects_file = /projects.json

[general]
short_name = Ant Financial

[sortinghat]
# Infrastructure for SortingHat (MariaDB/MySQL database)
host = localhost
user = grimoirelab
# Uncomment this before init
password =
database = grimoirelab_sh

# Use organizations file
load_orgs = true
orgs_file = /orgs.json

# Identities file in GrimoireLab format
identities_file = [/identities.yaml]
identities_format = grimoirelab

# Organization name for people not affiliated to any organization
unaffiliated_group = Unknown

# Ids known to be bots
bots_names = [sofastack-bot,antfin-oss]

# How to autoprofile
autoprofile = [Ant Financial:manual,git,github]

dashboard.cfg

# Mordred configuration file (Dashboard)
# Parameters related to the dashboard and how it is produced
# This is usually updated by the person maintaining the dashboard
#
# List: [val1, val2 ...]
# Int: int_value
# Int as string: "Int"
# List as string: "[val1, val2 ...]"
# String: string_value
# None: None, none
# Boolean: true, True, False, false

[general]
# Update incrementally, forever
update = true
# Don't start a new update earlier than (since last update, seconds)
min_update_delay = 300
# Produce debugging data for the logs
debug = true

[es_enrichment]
# Refresh identities and projects for all items after enrichment
autorefresh = true

[sortinghat]
# Run affilation
affiliate = True
# How to match to unify
matching = [email]
# How long to sleep before running again, for identities tasks
sleep_for = 100

[panels]
# Dashboard: default time frame
kibiter_time_from = "now-1y"
# Dashboard: default index pattern
kibiter_default_index = "git"
# GitHub repos panels
github-repos = true

[phases]
collection = true
identities = true
enrichment = true
panels = true

[git]
# Names for raw and enriched indexes
raw_index = git_grimoirelab-raw
enriched_index = git_grimoirelab
studies = [enrich_demography:git, enrich_areas_of_code:git, enrich_onion:git]

[github]
# Names for raw and enriched indexes
raw_index = github_grimoirelab-raw
enriched_index = github_grimoirelab
# Sleep it GitHub API rate is exhausted, waited until it is recovered
sleep-for-rate = true

[github:repo]
raw_index = github_grimoirelab_stats-raw
enriched_index = github_grimoirelab_stats
category = repository
no-archive = true
sleep-for-rate = true

# Studies

[enrich_demography:git]
#no_incremental = true   # default: false

[enrich_areas_of_code:git]
in_index = git_grimoirelab-raw
out_index = git_aoc_grimoirelab-enriched
#sort_on_field = metadata__timestamp
#no_incremental = false

[enrich_onion:git]
in_index = git_grimoirelab
out_index = git_onion_grimoirelab-enriched
#data_source = git
#contribs_field = hash
#timeframe_field = grimoire_creation_date
#sort_on_field = metadata__timestamp
#no_incremental = false

Debug

After the container started for a while, I logged in and run this command to figure out if the indexes generated.

$ curl http://localhost:9200/_cat/indices
yellow open git_grimoirelab-raw            f6s8DYd2TieLPIxMo4beTA 5 1 111  1   327kb   327kb
yellow open git_onion_grimoirelab-enriched O_vHjZLeTDGvwIOvC2VQuw 5 1 100  0  82.9kb  82.9kb
yellow open .kibana                        BLarSmeOSK2J5_KVT2gHkA 1 1 186 29 269.1kb 269.1kb
yellow open git_grimoirelab                ViqKMZJKTmOtcmE46qF4Ng 5 1 111  0   1.4mb   1.4mb
yellow open git_aoc_grimoirelab-enriched   Tim9zhBIQnyW_1Ooh1ebBA 5 1 899  0   2.1mb   2.1mb

No Github data.

image

How should I config to make the github_girmoirelab index generated and analysis run?

@rootsongjc rootsongjc changed the title Github index not generated when using gimoirelab/full image Github index not generated when using grimoirelab/full image Nov 7, 2019
@sduenas
Copy link
Member

sduenas commented Nov 7, 2019

I don't see you are using a github token to analyze those repositories. Maybe the process is still running because the rate for non-authenticated users is 60 (which runs out pretty fast).

I recommend to add your github token to [github] section with the paratemer api-token.

[github]
api-token = xxxxxxxxxxx
raw_index = github_grimoirelab-raw
enriched_index = github_grimoirelab
# Sleep it GitHub API rate is exhausted, waited until it is recovered
sleep-for-rate = true

@valeriocos
Copy link
Member

valeriocos commented Nov 7, 2019

There is a typo in the projects.json @rootsongjc , maybe this could be the root of your issue?

{
    "istio-handbook": {
        "git": [
            "https://github.com/servicemesher/istio-handbook"
        ],
        "gitub": [ <-------------- github
            "https://github.com/servicemesher/istio-handbook"
        ]
    }
}

@rootsongjc
Copy link
Author

I don't see you are using a github token to analyze those repositories. Maybe the process is still running because the rate for non-authenticated users is 60 (which runs out pretty fast).

I recommend to add your github token to [github] section with the paratemer api-token.

[github]
api-token = xxxxxxxxxxx
raw_index = github_grimoirelab-raw
enriched_index = github_grimoirelab
# Sleep it GitHub API rate is exhausted, waited until it is recovered
sleep-for-rate = true

I have configured it on setup.cfg file at the first line.

@rootsongjc
Copy link
Author

@valeriocos Thank you. That's exactly a typo. After fixed it, github_grimoirelab appeared but the index has no document.

yellow open git_grimoirelab-raw            f6s8DYd2TieLPIxMo4beTA 5 1 111  1 313.1kb 313.1kb
yellow open github_grimoirelab-raw         ImmJJ2SIRry1WyKCh2nMGA 5 1  37  1   1.2mb   1.2mb
yellow open github_grimoirelab_stats       CrLrXDdMQDO8JHqG1TvCMw 5 1   1  0   9.8kb   9.8kb
yellow open git_aoc_grimoirelab-enriched   Tim9zhBIQnyW_1Ooh1ebBA 5 1 899  0   2.1mb   2.1mb
yellow open github_grimoirelab             c5EeNP-uSOSNSgKqYBkPPw 5 1   0  0   1.2kb   1.2kb
yellow open git_onion_grimoirelab-enriched O_vHjZLeTDGvwIOvC2VQuw 5 1 100  0  82.9kb  82.9kb
yellow open .kibana                        BLarSmeOSK2J5_KVT2gHkA 1 1 186 31 284.5kb 284.5kb
yellow open git_grimoirelab                ViqKMZJKTmOtcmE46qF4Ng 5 1 111  0     1mb     1mb
yellow open github_grimoirelab_stats-raw   che2AAX1RZ-oRp6cXOGF2Q 5 1   1  0  68.8kb  68.8kb

@valeriocos
Copy link
Member

you're welcome @rootsongjc .

It could be related to min_update_delay = 300, which is the number of seconds between two mordred loops or an error in the enricher. Do you have some logs to share?

@rootsongjc
Copy link
Author

rootsongjc commented Nov 7, 2019

I can see grimoirelab backend request Github in /logs/all.log like these:

2019-11-07 12:53:07,792 - perceval.backends.core.github - DEBUG - Get GitHub paginated items from https://api.github.com/repos
/servicemesher/istio-handbook/issues/29/comments
2019-11-07 12:53:08,188 - sirmordred.task_identities - INFO - [sortinghat] Loading identities from file /tmp/tmpby_4ln9j
2019-11-07 12:53:08,211 - sortinghat.command - INFO - Database grimoirelab_sh:localhost None set
2019-11-07 12:53:08,243 - urllib3.connectionpool - DEBUG - https://api.github.com:443 "GET /repos/servicemesher/istio-handbook
/issues/29/comments?direction=asc&sort=updated&per_page=100 HTTP/1.1" 200 None
2019-11-07 12:53:08,257 - perceval.archive - DEBUG - Archiving 1aea139ab95cc30b13b6a8fa557149a2c72f649d with https://api.githu
b.com/repos/servicemesher/istio-handbook/issues/29/comments {'direction': 'asc', 'sort': 'updated', 'per_page': 100} None in /
home/grimoirelab/.perceval/archives/86/6963bbd2674431a8e86eabd834f67f.sqlite3
2019-11-07 12:53:08,274 - perceval.archive - DEBUG - 1aea139ab95cc30b13b6a8fa557149a2c72f649d data archived in /home/grimoirel
ab/.perceval/archives/86/6963bbd2674431a8e86eabd834f67f.sqlite3
2019-11-07 12:53:08,274 - perceval.client - DEBUG - Rate limit: 4612
2019-11-07 12:53:08,275 - perceval.client - DEBUG - Rate limit reset: 3410.0
2019-11-07 12:53:08,277 - perceval.backends.core.github - DEBUG - Getting info for https://api.github.com/users/zhongfox
2019-11-07 12:53:08,778 - urllib3.connectionpool - DEBUG - https://api.github.com:443 "GET /users/zhongfox HTTP/1.1" 200 None
2019-11-07 12:53:08,780 - perceval.archive - DEBUG - Archiving 2b98ff00d87840f63f8aba103fdbcc543a535e1b with https://api.githu
b.com/users/zhongfox None None in /home/grimoirelab/.perceval/archives/86/6963bbd2674431a8e86eabd834f67f.sqlite3
2019-11-07 12:53:08,786 - perceval.archive - DEBUG - 2b98ff00d87840f63f8aba103fdbcc543a535e1b data archived in /home/grimoirel
ab/.perceval/archives/86/6963bbd2674431a8e86eabd834f67f.sqlite3

and these

2019-11-07 12:53:29,074 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): localhost:9200
2019-11-07 12:53:29,079 - urllib3.connectionpool - DEBUG - http://localhost:9200 "GET /git_grimoirelab HTTP/1.1" 200 822
2019-11-07 12:53:29,506 - urllib3.connectionpool - DEBUG - http://localhost:9200 "PUT /github_grimoirelab HTTP/1.1" 200 85
2019-11-07 12:53:29,507 - grimoire_elk.elastic - INFO - Created index http://localhost:9200/github_grimoirelab
2019-11-07 12:53:29,507 - grimoire_elk.elastic - DEBUG - http://localhost:9200/github_grimoirelab/_search
        { "size": 0, "query": {"bool": {"filter": []}},
            "aggs": {
                "1": {
                  "max": {
                    "field": "metadata__timestamp"
                  }
                }
            }

        }

and these

2019-11-07 13:30:08,333 - grimoire_elk.elastic_items - DEBUG - Fetching from http://localhost:9200/github_grimoirelab_stats-raw: done receiving
2019-11-07 13:30:08,684 - urllib3.connectionpool - DEBUG - http://localhost:9200 "PUT /github_grimoirelab_stats/items/_bulk?refresh=true HTTP/1.1" 200 287
2019-11-07 13:30:08,684 - grimoire_elk.elastic - DEBUG - 2 items uploaded to ES (http://localhost:9200/github_grimoirelab_stats/items/_bulk)
2019-11-07 13:30:08,693 - urllib3.connectionpool - DEBUG - http://localhost:9200 "GET /github_grimoirelab_stats HTTP/1.1" 200 562
2019-11-07 13:30:08,709 - urllib3.connectionpool - DEBUG - http://localhost:9200 "GET /github_grimoirelab_stats HTTP/1.1" 200 562
2019-11-07 13:30:08,718 - urllib3.connectionpool - DEBUG - http://localhost:9200 "PUT /github_grimoirelab_stats/items/_mapping HTTP/1.1" 200 47
2019-11-07 13:30:08,729 - urllib3.connectionpool - DEBUG - http://localhost:9200 "PUT /github_grimoirelab_stats/items/_mapping HTTP/1.1" 200 47

No error found in the logs.

@jgbarah
Copy link
Contributor

jgbarah commented Nov 7, 2019

Your configuration worked for me as I describe below. Could you please check exactly this procedure?

I got your projects.json file (with the typo fixed), and a file mordred-credentials-jgb.cfg with just my GitHub credentials:

[github]
api-token = your_token_here

I put both of them in the same directory, and from it, I run the grimoirelab/fullI container as:

docker run -p 5601:5601 -p 9000:9200 -v $(pwd)/projects.json:/projects.json -v $(pwd)/mordred-credentials-jgb.cfg:/override.cfg -t grimoirelab/full

I get the following output:

Starting container: 6502a2d72caa
Starting Elasticsearch
[ ok ] Starting Elasticsearch Server:.
Waiting for Elasticsearch to start...
tcp        0      0 0.0.0.0:9200            0.0.0.0:*               LISTEN      -                   
Elasticsearch started
Starting MariaDB
[ ok ] Starting MariaDB database server: mysqld.
Waiting for MariaDB to start...
tcp6       0      0 :::3306                 :::*                    LISTEN      -                   
MariaDB started
Starting Kibiter
Waiting for Kibiter to start...
..Kibiter started
Starting SirMordred to build a GrimoireLab dashboard
This will usually take a while...
Collection for git: starting...
Collection for github: starting...
Loading blacklist...
0/0 blacklist entries loaded
Loading unique identities...
Collection for git: finished after 00:00:04 hours
53/53 unique identities loaded
Collection for github: finished after 00:00:17 hours
Collection for git: starting...
Collection for git: finished after 00:00:00 hours
Collection for github: starting...
Collection for github: finished after 00:00:01 hours

At this point, I pointed my browser to http://localhost:5601, and saw GitHub and git data already in the dashboard (issues and pull requests for GitHub).

@valeriocos
Copy link
Member

thank you @jgbarah for the feedback.

If this can be of any help, I changed a bit the configuration files and I was able to see the data too.

docker run -p 5601:5601 -p 9000:9200 -v $(pwd)/projects.json:/projects.json -v $(pwd)/dashboard.cfg:/dashboard.cfg -v $(pwd)/credentials.cfg:/override.cfg -t grimoirelab/full

the projects.json is:

{
    "istio-handbook": {
        "git": [
            "https://github.com/servicemesher/istio-handbook"
        ],
        "github": [
            "https://github.com/servicemesher/istio-handbook"
        ],
        "github:repo": [
            "https://github.com/servicemesher/istio-handbook"
        ]
    }
}

the credentials.cfg is

[github]
api-token = 98c46a30df784...

[github:repo]
api-token = 98c46a30df784...

and in the dashboard.cfg I have just added no-archive = true in the github section:

[github]
# Names for raw and enriched indexes
raw_index = github_grimoirelab-raw
enriched_index = github_grimoirelab
sleep-for-rate = true
no-archive = true <-----

This is the info about the indexes:

yellow open github_grimoirelab_stats       M1TapcejSBaH8tPUS58gFA 5 1   5  0  71.1kb  71.1kb
yellow open git_onion_grimoirelab-enriched LSwjnpSWQ-GUBPZaNXNOvw 5 1 100  0 140.3kb 140.3kb
yellow open git_grimoirelab                4XHVBzfISeOUTZXdO-WIvw 5 1 111  0   2.1mb   2.1mb
yellow open .kibana                        sabH_BTvQti9SgNL0dezfQ 1 1 186 29 295.1kb 295.1kb
yellow open github_grimoirelab             GHPFILlsTHyakC90j3eYLw 5 1  37  1 946.3kb 946.3kb
yellow open github_grimoirelab-raw         SqKKugoBTwiTb5cDc7c_IQ 5 1  37  1   1.6mb   1.6mb
yellow open git_grimoirelab-raw            iQoL4OZgRX-1XEo3HubU_w 5 1 111  1 340.4kb 340.4kb
yellow open git_aoc_grimoirelab-enriched   q-lPoepJRA6Ab2HsYt5wcA 5 1 899  0   2.9mb   2.9mb
yellow open github_grimoirelab_stats-raw   rcUhY73vRc-avcK2Z26QOw 5 1   5  0 343.9kb 343.9kb

@rootsongjc
Copy link
Author

rootsongjc commented Nov 8, 2019

@valeriocos I used the same configuration on my Mac computer and Alibaba Cloud to creating the Grimoirelab. My computer took a long time to show the collection of Github data, and there was no Github data on Alibaba Cloud, although I can see github_grimoirelab index.

Sometimes the github_grimoirelab index even didn't exist.

health status index                          uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   github_grimoirelab-raw         jRFSot6gQmallA-S6YOplw   5   1       4000            0     48.6mb         48.6mb
yellow open   git_onion_grimoirelab-enriched _Mm4kyolSQWFXaz7XeMuOQ   5   1       7202            0      2.1mb          2.1mb
yellow open   .kibana                        _h_yQu6hQeiqu8hytnNROw   1   1        186           29    266.7kb        266.7kb
yellow open   git_grimoirelab-raw            kOY5JamrSziqZxLHb6erww   5   1      15743            2     34.4mb         34.4mb
yellow open   git_aoc_grimoirelab-enriched   89lRfkOcRxqEOjUH0mqP9w   5   1     159928        22736    267.9mb        267.9mb
yellow open   github_grimoirelab_stats       j9rHS9S4S4-HP6mk7RNo7g   5   1        240            6    556.8kb        556.8kb
yellow open   github_grimoirelab_stats-raw   X7TlB5CiRN6h9LzqlBx_gg   5   1        240            0      4.2mb          4.2mb
yellow open   git_grimoirelab                dOzCLgDkShySvZ2Q18mldg   5   1      15743         5348     92.1mb         92.1mb

@rootsongjc
Copy link
Author

I finally know why, because Alibaba Cloud China's servers can't access some google services, and the program can't continue to run. After changed an oversea server, it works.

@valeriocos
Copy link
Member

Thank you @rootsongjc for your feedback!

@rootsongjc rootsongjc reopened this Nov 8, 2019
@rootsongjc
Copy link
Author

rootsongjc commented Nov 8, 2019

@valeriocos Whiled I change an oversea server, it only works for little repo, if I want to analysis a large one such as Kubernetes or istio, it didn't work.

Which step will github_grimoirelab index be generated?

@valeriocos
Copy link
Member

Sorry for the late reply @rootsongjc !

When you say it didn't work, you mean that you don't see the enriched index? If this is the case, it's probably due to the large size of the repos. In GrimoireLab, the enrichement starts after the collection, so for large repos you may need to wait for a while. In order to handle these cases, you can pass more GitHub tokens via the credentials.cfg file as follow:
api-token = [db...., 02..., ...]

Note that the tokens should be generated by different GitHub accounts. Tokens from the same account share the global number of petitions.

@rootsongjc
Copy link
Author

Yes, it didn't work means no github_grimoirelab. I am using grimoire to analysis large repositories. Thank you for your reply. @valeriocos

@valeriocos
Copy link
Member

Please @rootsongjc let us know if this issue gets solved or don't hesitate to report errors if needed, thanks!

@rootsongjc
Copy link
Author

@valeriocos It needs a long time to walk through, but it works now. I think this program may have a bit of a problem with concurrent performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants