Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

._rawData vs. .raw_data - Lazy Loading #2202

Open
phipz opened this issue Mar 9, 2022 · 5 comments
Open

._rawData vs. .raw_data - Lazy Loading #2202

phipz opened this issue Mar 9, 2022 · 5 comments

Comments

@phipz
Copy link

phipz commented Mar 9, 2022

I am confused about the difference of the ._rawData and the .raw_data attributes of GitubObjects.

While the ._rawData attributes returns stored data instantaneously, the .raw_data attribute tries to redownload the content - even though it is allready in the memory.

Let me provide a minimal example:

fom github import Github

g = Github(GitHubToken, per_page=1000)
repo = g.get_repo("PyGithub/PyGithub")
issues = repo.get_issues()

#Download Issues to memory
issues_list = []
for issue in issues:
	issues_list.append(issue)

## BLOCK NETWORK ACCESS of PYTHON

issues_list[0]._rawData 
# WORKS

issues_list[0].raw_data
# ERROR -> Failed to establish a new connection
@Felixoid
Copy link
Contributor

Felixoid commented Jul 13, 2022

Hello, I faced the same issue. The main problem here, as I see .raw_data spends one additional API request instead of reusing the existing object.

Here the example of my case
In [336]: api_prs = gh.search_issues('', sort="created", type='pr', repo='ClickHouse/ClickHouse', updated="2022-05-02..2022-05-03")

In [337]: gh.get_rate_limit().raw_data
Out[337]: 
{'core': {'limit': 5000, 'used': 6, 'remaining': 4994, 'reset': 1657708420},
 'search': {'limit': 30, 'used': 2, 'remaining': 28, 'reset': 1657705138},
 'graphql': {'limit': 5000, 'used': 0, 'remaining': 5000, 'reset': 1657708682},
 'integration_manifest': {'limit': 5000,
  'used': 0,
  'remaining': 5000,
  'reset': 1657708682},
 'source_import': {'limit': 100,
  'used': 0,
  'remaining': 100,
  'reset': 1657705142},
 'code_scanning_upload': {'limit': 1000,
  'used': 0,
  'remaining': 1000,
  'reset': 1657708682},
 'actions_runner_registration': {'limit': 10000,
  'used': 0,
  'remaining': 10000,
  'reset': 1657708682},
 'scim': {'limit': 15000, 'used': 0, 'remaining': 15000, 'reset': 1657708682},
 'dependency_snapshots': {'limit': 100,
  'used': 0,
  'remaining': 100,
  'reset': 1657705142}}

In [338]: api_prs[1]._rawData
Out[338]: 
{'url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881',
 'repository_url': 'https://api.github.com/repos/ClickHouse/ClickHouse',
 'labels_url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/labels{/name}',
 'comments_url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/comments',
 'events_url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/events',
 'html_url': 'https://github.com/ClickHouse/ClickHouse/pull/36881',
 'id': 1224375411,
 'node_id': 'PR_kwDOA5dJV843PeWA',
 'number': 36881,
 'title': 'update docs for time window functions',
 'user': {'login': 'serxa',
  'id': 1014716,
  'node_id': 'MDQ6VXNlcjEwMTQ3MTY=',
  'avatar_url': 'https://avatars.githubusercontent.com/u/1014716?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/serxa',
  'html_url': 'https://github.com/serxa',
  'followers_url': 'https://api.github.com/users/serxa/followers',
  'following_url': 'https://api.github.com/users/serxa/following{/other_user}',
  'gists_url': 'https://api.github.com/users/serxa/gists{/gist_id}',
  'starred_url': 'https://api.github.com/users/serxa/starred{/owner}{/repo}',
  'subscriptions_url': 'https://api.github.com/users/serxa/subscriptions',
  'organizations_url': 'https://api.github.com/users/serxa/orgs',
  'repos_url': 'https://api.github.com/users/serxa/repos',
  'events_url': 'https://api.github.com/users/serxa/events{/privacy}',
  'received_events_url': 'https://api.github.com/users/serxa/received_events',
  'type': 'User',
  'site_admin': False},
 'labels': [{'id': 1310920248,
   'node_id': 'MDU6TGFiZWwxMzEwOTIwMjQ4',
   'url': 'https://api.github.com/repos/ClickHouse/ClickHouse/labels/pr-documentation',
   'name': 'pr-documentation',
   'color': '007700',
   'default': False,
   'description': 'Documentation PRs for the specific code PR'}],
 'state': 'closed',
 'locked': False,
 'assignee': None,
 'assignees': [],
 'milestone': None,
 'comments': 0,
 'created_at': '2022-05-03T17:13:34Z',
 'updated_at': '2022-05-03T18:11:12Z',
 'closed_at': '2022-05-03T18:11:11Z',
 'author_association': 'MEMBER',
 'active_lock_reason': None,
 'draft': False,
 'pull_request': {'url': 'https://api.github.com/repos/ClickHouse/ClickHouse/pulls/36881',
  'html_url': 'https://github.com/ClickHouse/ClickHouse/pull/36881',
  'diff_url': 'https://github.com/ClickHouse/ClickHouse/pull/36881.diff',
  'patch_url': 'https://github.com/ClickHouse/ClickHouse/pull/36881.patch',
  'merged_at': '2022-05-03T18:11:11Z'},
 'body': '### Changelog category (leave one):\r\n- Documentation (changelog entry is not required)\r\n\r\n\r\n### Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):\r\n...\r\n\r\n\r\n> Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/\r\n',
 'reactions': {'url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/reactions',
  'total_count': 0,
  '+1': 0,
  '-1': 0,
  'laugh': 0,
  'hooray': 0,
  'confused': 0,
  'heart': 0,
  'rocket': 0,
  'eyes': 0},
 'timeline_url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/timeline',
 'performed_via_github_app': None,
 'state_reason': None,
 'score': 1.0}

In [339]: gh.get_rate_limit().raw_data
Out[339]: 
{'core': {'limit': 5000, 'used': 6, 'remaining': 4994, 'reset': 1657708420},
 'search': {'limit': 30, 'used': 2, 'remaining': 28, 'reset': 1657705138},
 'graphql': {'limit': 5000, 'used': 0, 'remaining': 5000, 'reset': 1657708697},
 'integration_manifest': {'limit': 5000,
  'used': 0,
  'remaining': 5000,
  'reset': 1657708697},
 'source_import': {'limit': 100,
  'used': 0,
  'remaining': 100,
  'reset': 1657705157},
 'code_scanning_upload': {'limit': 1000,
  'used': 0,
  'remaining': 1000,
  'reset': 1657708697},
 'actions_runner_registration': {'limit': 10000,
  'used': 0,
  'remaining': 10000,
  'reset': 1657708697},
 'scim': {'limit': 15000, 'used': 0, 'remaining': 15000, 'reset': 1657708697},
 'dependency_snapshots': {'limit': 100,
  'used': 0,
  'remaining': 100,
  'reset': 1657705157}}

In [340]: api_prs[1].raw_data
Out[340]: 
{'url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881',
 'repository_url': 'https://api.github.com/repos/ClickHouse/ClickHouse',
 'labels_url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/labels{/name}',
 'comments_url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/comments',
 'events_url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/events',
 'html_url': 'https://github.com/ClickHouse/ClickHouse/pull/36881',
 'id': 1224375411,
 'node_id': 'PR_kwDOA5dJV843PeWA',
 'number': 36881,
 'title': 'update docs for time window functions',
 'user': {'login': 'serxa',
  'id': 1014716,
  'node_id': 'MDQ6VXNlcjEwMTQ3MTY=',
  'avatar_url': 'https://avatars.githubusercontent.com/u/1014716?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/serxa',
  'html_url': 'https://github.com/serxa',
  'followers_url': 'https://api.github.com/users/serxa/followers',
  'following_url': 'https://api.github.com/users/serxa/following{/other_user}',
  'gists_url': 'https://api.github.com/users/serxa/gists{/gist_id}',
  'starred_url': 'https://api.github.com/users/serxa/starred{/owner}{/repo}',
  'subscriptions_url': 'https://api.github.com/users/serxa/subscriptions',
  'organizations_url': 'https://api.github.com/users/serxa/orgs',
  'repos_url': 'https://api.github.com/users/serxa/repos',
  'events_url': 'https://api.github.com/users/serxa/events{/privacy}',
  'received_events_url': 'https://api.github.com/users/serxa/received_events',
  'type': 'User',
  'site_admin': False},
 'labels': [{'id': 1310920248,
   'node_id': 'MDU6TGFiZWwxMzEwOTIwMjQ4',
   'url': 'https://api.github.com/repos/ClickHouse/ClickHouse/labels/pr-documentation',
   'name': 'pr-documentation',
   'color': '007700',
   'default': False,
   'description': 'Documentation PRs for the specific code PR'}],
 'state': 'closed',
 'locked': False,
 'assignee': None,
 'assignees': [],
 'milestone': None,
 'comments': 0,
 'created_at': '2022-05-03T17:13:34Z',
 'updated_at': '2022-05-03T18:11:12Z',
 'closed_at': '2022-05-03T18:11:11Z',
 'author_association': 'MEMBER',
 'active_lock_reason': None,
 'draft': False,
 'pull_request': {'url': 'https://api.github.com/repos/ClickHouse/ClickHouse/pulls/36881',
  'html_url': 'https://github.com/ClickHouse/ClickHouse/pull/36881',
  'diff_url': 'https://github.com/ClickHouse/ClickHouse/pull/36881.diff',
  'patch_url': 'https://github.com/ClickHouse/ClickHouse/pull/36881.patch',
  'merged_at': '2022-05-03T18:11:11Z'},
 'body': '### Changelog category (leave one):\r\n- Documentation (changelog entry is not required)\r\n\r\n\r\n### Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):\r\n...\r\n\r\n\r\n> Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/\r\n',
 'closed_by': {'login': 'serxa',
  'id': 1014716,
  'node_id': 'MDQ6VXNlcjEwMTQ3MTY=',
  'avatar_url': 'https://avatars.githubusercontent.com/u/1014716?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/serxa',
  'html_url': 'https://github.com/serxa',
  'followers_url': 'https://api.github.com/users/serxa/followers',
  'following_url': 'https://api.github.com/users/serxa/following{/other_user}',
  'gists_url': 'https://api.github.com/users/serxa/gists{/gist_id}',
  'starred_url': 'https://api.github.com/users/serxa/starred{/owner}{/repo}',
  'subscriptions_url': 'https://api.github.com/users/serxa/subscriptions',
  'organizations_url': 'https://api.github.com/users/serxa/orgs',
  'repos_url': 'https://api.github.com/users/serxa/repos',
  'events_url': 'https://api.github.com/users/serxa/events{/privacy}',
  'received_events_url': 'https://api.github.com/users/serxa/received_events',
  'type': 'User',
  'site_admin': False},
 'reactions': {'url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/reactions',
  'total_count': 0,
  '+1': 0,
  '-1': 0,
  'laugh': 0,
  'hooray': 0,
  'confused': 0,
  'heart': 0,
  'rocket': 0,
  'eyes': 0},
 'timeline_url': 'https://api.github.com/repos/ClickHouse/ClickHouse/issues/36881/timeline',
 'performed_via_github_app': None,
 'state_reason': None}

In [341]: gh.get_rate_limit().raw_data
Out[341]: 
{'core': {'limit': 5000, 'used': 7, 'remaining': 4993, 'reset': 1657708420},
 'search': {'limit': 30, 'used': 2, 'remaining': 28, 'reset': 1657705138},
 'graphql': {'limit': 5000, 'used': 0, 'remaining': 5000, 'reset': 1657708710},
 'integration_manifest': {'limit': 5000,
  'used': 0,
  'remaining': 5000,
  'reset': 1657708710},
 'source_import': {'limit': 100,
  'used': 0,
  'remaining': 100,
  'reset': 1657705170},
 'code_scanning_upload': {'limit': 1000,
  'used': 0,
  'remaining': 1000,
  'reset': 1657708710},
 'actions_runner_registration': {'limit': 10000,
  'used': 0,
  'remaining': 10000,
  'reset': 1657708710},
 'scim': {'limit': 15000, 'used': 0, 'remaining': 15000, 'reset': 1657708710},
 'dependency_snapshots': {'limit': 100,
  'used': 0,
  'remaining': 100,
  'reset': 1657705170}}

The Objects in raw_data and _rawData are slightly different, raw_data has an additional 'closed_by', but it's not the problem for my case.

How can I avoid spending hundreds and thousands API requests per program launch w/o accessing a protected attribute _rawData?

@Felixoid
Copy link
Contributor

Excuse me, dear @jacquev6, @adamtheturtle, and @sfdye (mentioning you since activity on GH).

Don't you consider the mentioned fact that accessing .raw_data spends additional API requests and affects rate limit an issue?

@Felixoid
Copy link
Contributor

I assume, because of that one can't get gh.search_issues(...).repository, it spends request too.

Any thoughts, please?

@Felixoid
Copy link
Contributor

Felixoid commented Aug 3, 2022

This issue forced me to write our own wrapper around the annoying issue, so we spend magnitude fewer requests than it could be https://github.com/ClickHouse/ClickHouse/blob/469b7e7/tests/ci/github_helper.py#L102

It as well uses local caching. It as well helps to reduce the number of requests many times.

@EnricoMi
Copy link
Collaborator

I agree, it is surprising that raw_data for completable objects with completed==False triggers a request. Changing this would be quite some breaking change. We could make _rawData available via current_raw_data. And we should make clear in the API documentation that behaviour of raw_data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants