Skip to content

Conversation

@tom-pytel
Copy link
Contributor

@tom-pytel tom-pytel commented Dec 11, 2020

Started on new plugin for aiohttp, requester works but still need to add server component. Again hoping you can handle the tests for this. Will see about adding component ID and maybe logo myself when it is done.

  • Add a test case for the new plugin
  • Add a component id in the main repo
  • Add a logo in the UI repo
  • Rebuild the requirements.txt by running tools/env/build_requirements_(linux|windows).sh

@tom-pytel
Copy link
Contributor Author

tom-pytel commented Dec 11, 2020

Here is a very async test snippet:

import asyncio
import aiohttp

from skywalking import agent
from skywalking.decorators import trace
from skywalking.trace.context import get_context
import skywalking.trace.context as context

cities = ['paris', 'lyon', 'nice']

agent.start()

async def child(i):
    async with aiohttp.ClientSession() as session:
        url = f"https://geo.api.gouv.fr/communes?nom={cities[i]}&fields=nom,region&format=json&geometry=centr"

        async with session.get(url) as response:
            return await response.json()

loop = asyncio.get_event_loop()

with get_context().new_local_span('/parent'):
    loop.run_until_complete(
        asyncio.gather(
            *(trace(f'child{i}')(child)(i) for i in range(3))
        )
    )

url = URL(str_or_url)
peer = \
(url.scheme + '://' if url.scheme else '') + \
((url.user or '') + ':' + (url.password or '') + '@' if url.user or url.password else '') + \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we want to include password here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its probably better not to. The thing is I copied this behavior from the other plugins so they will need to be changed to explicitly remove the password as well since urllib.parse always returns it as part of netloc if it is present and so would be recorded. Then the Node agent also, as well I am wondering if the Java plugins record this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its probably better not to. The thing is I copied this behavior from the other plugins so they will need to be changed to explicitly remove the password as well since urllib.parse always returns it as part of netloc if it is present and so would be recorded. Then the Node agent also, as well I am wondering if the Java plugins record this?

The Java agent doesn't record this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java agent doesn't record this

So just Python and Node, both incoming and outgoing, http and any other protocols (layers)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java agent doesn't record this

So just Python and Node, both incoming and outgoing, http and any other protocols (layers)?

Do you mean peer or password? We usually only record host/ip and port in peer

Copy link
Contributor Author

@tom-pytel tom-pytel Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like for example in sw_requests.py, the following code will store the user and password if I am not mistaken:

    def _sw_request(this: Session, method, url,
                    params=None, data=None, headers=None, cookies=None, files=None,
                    auth=None, timeout=None, allow_redirects=True, proxies=None,
                    hooks=None, stream=None, verify=None, cert=None, json=None):

        from urllib.parse import urlparse
        url_param = urlparse(url)

        ...

        context = get_context()
        with context.new_exit_span(op=url_param.path or "/", peer=url_param.netloc) as span:

I think this is repeated across several plugins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to review these plugins and if they really did store username/password, we need to remove them explicitly, usually as the name peer indicates, we only store host/IP and port into it

Comment on lines 47 to 50
if headers is None:
headers = CIMultiDict()
elif not isinstance(headers, (MultiDictProxy, MultiDict)):
headers = CIMultiDict(headers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should put headers back to kwargs? Also, is it safe to "override" the existing headers if it's not MultiDic(Proxy)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put headers back to kwargs, yes I meant to do this but forgot, this is why reviews are good 😃

As for "overriding", as far as I can see it should be ok since aiohttp doesn't do anything with the headers until it executes the following in ClientSession._prepare_headers():

if not isinstance(headers, (MultiDictProxy, MultiDict)):
    headers = CIMultiDict(headers)

Which is actually where I got it from, and yes CIMultiDict is an instance of MultiDict so I am just executing what would be done anyway a little earlier.

@tom-pytel
Copy link
Contributor Author

Test snippet:

from aiohttp import web
from skywalking import agent

agent.start()

async def handle(request):
    name = request.match_info.get('name', "Anonymous")
    text = "Hello, " + name
    return web.Response(text=text)

app = web.Application()
app.add_routes([web.get('/', handle),
                web.get('/{name}', handle)])

if __name__ == '__main__':
    web.run_app(app, port=8000)

Am I missing anything to be stored or handled?

@kezhenxu94
Copy link
Member

Test snippet:

from aiohttp import web
from skywalking import agent

agent.start()

async def handle(request):
    name = request.match_info.get('name', "Anonymous")
    text = "Hello, " + name
    return web.Response(text=text)

app = web.Application()
app.add_routes([web.get('/', handle),
                web.get('/{name}', handle)])

if __name__ == '__main__':
    web.run_app(app, port=8000)

Am I missing anything to be stored or handled?

I’ll test locally, also, can you please uncomment the PR template so that we can check the items one by one (some may be checked by me though)

@kezhenxu94
Copy link
Member

Hi @tom-pytel , sorry for closing this by mistake because of a force push, can you please reopen another PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants